Prepping for Simple Image Recognition

This week at Galvanize, we covered a variety of topics, from web scraping to clustering techniques. I want to focus on dimensionality reduction today, as it’s a challenging but crucial technique for working with real-world data.

This week also marked our first foray into working with text and image data. Up to this point, we’d always started from nice tabular, numerical data. Machine learning algorithms really only understand numbers, so we must first translate our text or images into something the machine understands. We want to give our algorithm a feature matrix– really, just something resembling a spreadsheet, with our features as columns across the top and each data point as a row in the spreadsheet. Each ‘cell’ would then be a number representing that data point’s value for that feature. How would you turn an image into such a table?

It turns out you break each image down into pixels, and assign a value that corresponds to the shade of that pixel. Our features will be each pixel location. If we’re talking grayscale images, the numeric value in each row of our table will be how light or dark that specific pixel is in the row’s image, usually on a scale from 0 (black) to 255 (white). You can imagine this turns images into huge amounts of data– for example, for one little 8×8 pixel image, you now have 64 numerical features.

Text and image data quickly snowballs into thousands of features– which is a problem for some of our models, In fact, it’s such a problem it’s known as the “curse of dimensionality.” To address this, there are methods to identify and extract the most important components or transformed features, and use those in your model.

MNIST, a classic machine learning dataset for image recognition.

MNIST, a classic machine learning dataset for image recognition.

We illustrated this with the MNIST dataset, a classic dataset for image recognition. MNIST is a bunch of handwritten digits, as seen above in a sample of 100 such images. It’s easy for us to look at any given image and recognize it as a 2 or a 4, but how could we train a computer to do this?

You might recognize this as a classification problem (is the image class ‘2’ , class ‘3’, class ‘4’, etc?). Our goal in one exercise this week was to perform just the pre-processing required before one could use a classifier. We used a dimensionality reduction technique called principal component analysis (PCA) to pull out the most important transformed features, or components, from the dataset.

The image below shows what happens when you project the transformed dataset (here using only 0s – 5s) down into 2 dimensions, so it can be easily visualized. The points are the true labels, or classes, of the image. The x- and y-axes are transformed features from our original feature matrix of pixels. One drawback of PCA is that transformation makes the features challenging to interpret– they are no longer the columns from our original feature matrix, but some weird combination of them that we can’t easily name.

mnist

After PCA, the data projected into 2-dimensions

Already, you’ll notice that the 4’s are showing up near each other, the 0’s are grouped away from the 4’s with little overlap, while the 2’s and 3’s overlap a lot– those digits are visually much more similar than a 0 and 4, no? If we were to cluster this dataset, you can imagine getting pretty decent results from using only these first 2 principal components.

Leave a Reply