SmallNORB is a dataset used in machine learning. It shows toys from various angles and in different lighting.

It’s available here: https://cs.nyu.edu/~ylclab/data/norb-v1.0-small/

This post:

• parses the files’ headers and loads the data into numpy arrays
• draws a few of the images and has an interactive tool.

I’ve only tested this on Python 3.5 on a Mac. Other than that, it should only require numpy, jupyter, and matplotlib.

You might also be interested in https://github.com/ndrplz/small_norb/ which also helped me figure out how to do this in Python.

This post is available as a blog post or a notebook.

## Overview of files

Before I start, here’s an overview of the dataset based on information from the page (the source of truth for this dataset!)

There are a number of toys of different categories. Each toy is photographed from a fixed number of angles and lighting. The toys are grouped into one of 5 categories such as ‘cars’. In each category, there are 10 toys which are referred to as an “instances.”

There are two sets of files, one for training and one for testing. The training and testing datasets are divided across instances. So the 24,300 examples of instances 4, 6, 7, 8, 9 of all classes make up the training set.

Each set contains three files. Each file’s first dimension represents the 24,300 examples. The files are:

• dat: The images! An example is of size (2, 96, 96). This represents two (96, 96) images, one for each of two cameras.
• cat: An example is an int. This represents the category id.
• info: An example is (4,). This represents the metadata about lighting/angle/etc. Specifically:
• instance: id [0-9] of which toy from the category it is.
• elevation: id [0-8] representing the elevation of the camera (from 30 to 70 using increments of 5).
• azimuth: id [0-34] (only even numbers!), where multiplying by 10 represents the azimuth. Heads up that instances aren’t aligned! azimuth=0 for two toys will point in two different directions.
• lighting: id [0-5] representing the lighting.

The next code block downloads and decompresses the 6 files from https://cs.nyu.edu/~ylclab/data/norb-v1.0-small/ in the DATA_FOLDER location. If you already have the decompressed files, you can point DATA_FOLDER to their folder.

Heads up, the dat files are large. This may take a while.

The site walks through how the files are represented. Each of the files has a header that explains how to read the data structure of the rest of the file.

In this block, for each file, I read the header, then use it to load a numpy array of the file.

### Saving the data in a new format

If at this point, you want SmallNORB in npz file for another project, you could do something like:

np.savez_compressed(
'data/smallnorb-20180503-train.npz',
)


## Exploring the dataset

Now I’ll explore the dataset.

The downloaded dataset splits instances across the train and test set. For exploration, I’m going to concatenate them and make a mapping from metadata and category to the image index.

## Rotating image example

I can use the example_lookup dictionary to look-up related images. So for example, I can vary nothing but the azimuth.

### Interactive exploration

This one is my favorite, though you’ll need this notebook local to view it. I can use the sliders to rotate the toys or view different toys.