New!  Ground truth for test data
Patch for images and precomputed features ( updated on 7/17/2010 )Due to some error in our image collection process, a very small portion of packaged images are blank images returned from websites where the original images have become unavailable. We found 970 (out of 1.2M) such images in training, 9 (out of 50K) in validation and 19 (out of 150K) in test. Although this should not noticeably impact training and testing, we release a patch that contains the correct images (6MB) and the correct pre-computed features (80MB). Please go to the "images" and "features" download sections to download the patches.
To apply the patch to the images, simply replace the old images with the new ones in the patch. For precomputed features,we provide a matlab program to modify your old feature files. Please consult the readme files for details.
The development kit includes
- Meta data for the competition categories.
- Matlab routines for evaluating submissions.
- A demo implementing and evaluating a simple baseline system using precomputed SIFT[1,2] features and LIBLINEAR.
- Code for computing the features used in the baseline demo.
Please be sure to consult the readme file included in the development kit.
Development kit. 3MB.
ImagesThe training images are the same images in the ImageNet 2010 Spring Release. There are a total of 1,261,406 images for training. The number of images for each synset (category) ranges from 668 to 3047.
There are 50,000 validation images, with 50 images per synset.
All images are in JPEG format.
To download the images, please register first, even if you are not entering the competition.
Download links will be sent to you via email.
We have computed dense SIFT features for all iamges -- training, validation and test. They are available for
download (features for test data will be made available later).
Each image is resized to have a max side length of 300 pixel (smaller images are not enlarged). SIFT descriptors are computed on 20x20 overlapping patches with a spacing of 10 pixels. Images are further downsized (to 1/2 the side length and then 1/4 of the side length) and more descriptors are computed. We use the VLFeat implemenation of dense SIFT (version 0.9.4.1).
We then perform k-means clustering of a random subset of 10 million SIFT descriptors to form a visual vocabulary of 1000 visual words. Each SIFT descriptor is quantized into a visual word using the nearest cluster center.
We provide both raw SIFT features (vldsift) and the visual codewords (sbow). Spatial coordiates of each descriptor/codeword are also included.
To run the demo system included in the development kit, you need to download the visual words features( for train and validation). Note that the raw SIFT features are not needed to run the demo code.
Please consult the readme file in the development kit for more details.
New!  Patch for all features.. 80MB.
Visual words (sbow) for training. 5.1GB. MD5: 0e0257af7a524aee89a2ce6246798a3f
Visual words (sbow) for validation. 205MB. MD5: b20164d925280b45219b51c2122cbd61
Visual words (sbow) for test. 613MB. MD5: de53389fd1972e2bb32cf5083efe01dc
Raw SIFT features (vldsift) for training. 375GB. MD5: aa2fdaa6fb119a451a23acd55bc57831
Raw SIFT features (vldsift) for validation. 15GB. MD5: c1b343347d8add28875332fc0f97e398
Raw SIFT features (vldsift) for test. 45GB. MD5: a3d348d9eba5db60ab1474ed10dd2bec
- David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004. pdf
- A. Vedaldi and B. Fulkerson. VLFeat: An Open and Portable Library of Computer Vision Algorithms. 2008. http://www.vlfeat.org
- R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9(2008), 1871-1874. http://www.csie.ntu.edu.tw/~cjlin/liblinear/