Master's Thesis in Informatik

Image Object and Document Classification
using Neural Networks with Replicated Softmax
Input Layers

Jörg Landthaler

Abstract

This thesis explores a variant of the bag-of-visual-words framework with a large fraction of unsupervised learning to predict the presence or absence of objects in images. We extract local image features with different SIFT detector and descriptor implementations from the PASCAL VOC 2007 dataset. Based on the bag-of-visual-words assumption, we quantize the visual words into visual word counts using Sculley's Mini-batch k-Means. Afterwards, we train Neural Networks with Replicated Softmax input and multilabel classifcation output layers. We use these Neural Networks with multiclass classification output layers (softmax) for the task of document classification, too.
Major contributions of this thesis encompass a detailed mathematical derivation and implementation of the Replicated Softmax (RSM) model presented by Salakhutdinov and Hinton, as well as a detailed mathematical derivation of Welling et al.'s Exponential Family Harmoniums (EFH). We can report classification results on the 20 Newsgroups dataset competitive e.g. to the directed DiscLDA model. Moreover, Neural Networks with RSM input layers significantly outperform standard Feed-forward Neural Networks on the PASCAL VOC 2007 image object classification challenge (in terms of mean Average Precision).
Moreover, we present the DualRSM model for image object classification that adds a second visible units wing to the RSM model and hence enables it to combine two different histogram data inputs. In particular, we train it on visual word counts together with their respective all-to-all distances histogram. This is an attempt to incorporate information about the spatial relationships among the visual words, i.e. to leverage the strongly simplifying bag-of(-visual)- words assumption.

Download: landthal2011.pdf

The Python implementation of the Replicated Softmax model can be found here: Replicated Softmax implementation.

Master's Thesis in Informatik, advised by Christian Osendorfer at TU München, I6.
You can reach me here: joerg [-at-] fylance [-.-] de

Jörg Landthaler - Last modified: Wed Dec 28 15:15:21 2011 CET

Master's Thesis in Informatik

Image Object and Document Classification using Neural Networks with Replicated Softmax Input Layers

Jörg Landthaler

Abstract

Image Object and Document Classification
using Neural Networks with Replicated Softmax
Input Layers