For an assignment on convolutional neural
networks for deep learning practical, I needed to implement somewhat efficient convolutions. I learned about
numpy.einsum in the process and wanted to share it!
- Part 1 is an introduction to the problem and how I used
- Part 2 is about
The assignment was to classify handwritten characters using convolutional neural networks. For pedagogical reasons, I needed to implement convolutions.
We used the EMNIST dataset. Below is a sample of 40 example images from the dataset.
The image below shows what happens when kernels are applied (convolved). The first row shows examples. The five bottom rows are the results of convolving each of five kernels, also known as the feature maps.
Convolutional neural nets are pretty cool, but that’s all I’ll say about convolutional neural networks for now. For more information, check out cs231n.
The code I’m going to do in this series basically does the following (fyi: if you saw this earlier, I’ve edited it):
There are a couple details on how kernels and inputs are flipped or padded (convolutions vs cross-correlations; forward propagation vs back propagation; dealing with edges), but I’ll assume inputs and kernel are already set up.
Kernels have parameters describing how to weight each pixel. For example, below is a
3 x 3 kernel with 9 parameters:
If my input was a
3 x 3 grayscale image, I could think of putting this kernel on top of the image and multiplying each kernel parameter by the corresponding input pixel value. The resulting feature map would be a single pixel containing the sum of all pixels.
For a larger image, convolutions are done by sliding the kernel over the image to create the feature map. Here’s the canonical image:
Victor Powell’s post helped me understand image kernels.
A tricky part is telling
numpy to slide the kernel across the inputs.
One approach could be using the nested for-loops above, and classmates did have luck using for-loops with Numba.
I wanted to see if I could do it with
numpy and came across
as_strided tricks numpy into looking at the array data in memory in a new way.
as_strided in convolutions, I used
as_strided to add two more dimensions the size of the kernel.
I also reduced the first two dimensions so that they were the size of the resulting feature map. To use
two additional arguments are needed: the shape of the resulting array and the strides to use.
(An aside, these high-dimensional matrices is called a tensor, as in TensorFlow.)
The way I think of this particular 4D tensor is a spreadsheet where each cell contains a little kernel-sized spreadsheet. If I looked at one cell of the outer spreadsheet, the kernel-sized spreadsheet should be the values that I multiply and sum with the kernel parameters to get the corresponding value in the feature map.
Or, in an image
By getting it into this form, I can use other functions to multiply and sum across dimensions.
One way to understand it is to imagine how a computer might store a 2D array in memory.
For a program to represent a 2D array, it fakes it. In the gif below, the left shows the faked array and the right shows an imagined memory representation. Moving left and right moves left or right, but moving up or down has to jump forward by the width.
This is where
.strides comes in handy. For example, when the array goes to print the next element
to the right, I can tell it to jump forward as if it was moving down. If I do this correctly, I can
produce the results above. That said, figuring out the strides parameter is one of the trickiest parts.
Phew. Here’s some example code that does this:
Final note: “This function has to be used with extreme care”
as_strided documentation says, “This function has to be used with extreme care”.
I felt fine experimenting with it, because the code was not for production, and I had an idea of memory layouts. But I still messed up and it was interesting.
After implementing convolutions, I decided to use
as_strided to broadcast the bias term. However I forgot to update a variable, and it expanded a tiny test array into a much-too-large matrix. That resulted in it pulling garbage numbers out of other parts of memory! It would randomly add things like (10^300) to my convolutions!
One thing I’m learning in machine learning is that when things are horribly broken, they can still seem to work but with a tiny bit lower performance than expected. This was one of those cases.
I didn’t realize something bad was happening and thought it was just that CNN’s are harder to train. I ended up getting it to train with a sigmoid non-linearity, with okay but not great performance.
For fun, here’s what the filters looked like: