This thesis explores a variant of the bag-of-visual-words framework with a large fraction of
unsupervised learning to predict the presence or absence of objects in images. We extract local
image features with different SIFT detector and descriptor implementations from the PASCAL
VOC 2007 dataset. Based on the bag-of-visual-words assumption, we quantize the visual words
into visual word counts using Sculley's Mini-batch k-Means. Afterwards, we train Neural Networks
with Replicated Softmax input and multilabel classifcation output layers. We use these
Neural Networks with multiclass classification output layers (softmax) for the task of document
Major contributions of this thesis encompass a detailed mathematical derivation and implementation of the Replicated Softmax (RSM) model presented by Salakhutdinov and Hinton, as well as a detailed mathematical derivation of Welling et al.'s Exponential Family Harmoniums (EFH). We can report classification results on the 20 Newsgroups dataset competitive e.g. to the directed DiscLDA model. Moreover, Neural Networks with RSM input layers significantly outperform standard Feed-forward Neural Networks on the PASCAL VOC 2007 image object classification challenge (in terms of mean Average Precision).
Moreover, we present the DualRSM model for image object classification that adds a second visible units wing to the RSM model and hence enables it to combine two different histogram data inputs. In particular, we train it on visual word counts together with their respective all-to-all distances histogram. This is an attempt to incorporate information about the spatial relationships among the visual words, i.e. to leverage the strongly simplifying bag-of(-visual)- words assumption.
Download: landthal2011.pdf, corrected (downloaded 1365 times)
The Python implementation of the Replicated Softmax model can be found here: Replicated Softmax implementation.
Master's Thesis in Informatik, advised by Christian Osendorfer at TU München, I6.
You can reach me here: joerg [-at-] fylance [-.-] de