MNIST Multi-Layer Perceptron Model

Case Study

We will be building a Multi Layer Perceptron model to classify hand written digits using TensorFlow. The images which we will be working with are black and white images of size 28 x 28 pixels, or 784 pixels total. Our features will be the pixel values for each pixel. Either the pixel is “white” (blank with a 0), or there is some pixel value. We will try to correctly predict what number is written down based solely on the image data in the form of an array. This type of problem (Image Recognition) is a great use case for Deep Learning Methods!

1. Preparation

In [1]:
import tensorflow as tf

Popular data set that we can import from tensorflow

In [7]:
from tensorflow.examples.tutorials.mnist import input_data
In [8]:
mnist = input_data.read_data_sets("/tmp/data", one_hot=True)
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
In [9]:
type(mnist)
Out[9]:
tensorflow.contrib.learn.python.learn.datasets.base.Datasets

55,000 images and each image is an array of 784

In [10]:
mnist.train.images.shape 
Out[10]:
(55000, 784)
In [17]:
sample = mnist.train.images[12].reshape(28,28)
In [14]:
import matplotlib.pyplot as plt
%matplotlib inline
In [18]:
plt.imshow(sample,cmap='Greys')
Out[18]:
<matplotlib.image.AxesImage at 0x1a42db83c8>

Parameters

It is really difficult to know what good parameter values are on a data set for which you have no experience with, however since MNIST is pretty famous, we have some reasonable values for our data below. The parameters here are:

  • Learning Rate – How quickly to adjust the cost function.
  • Training Epochs – How many training cycles to go through
  • Batch Size – Size of the ‘batches’ of training data
In [19]:
learning_rate = 0.001 #how quickly to learn
training_epochs = 15
batch_size = 100

Network Parameters

In [20]:
n_classes = 10 #MNIST total classes (0 - 9 digits)
n_samples = mnist.train.num_examples #(how many images)
In [21]:
n_input = 784 #MNIST data input (img shape is 28x28)
In [22]:
n_hidden_1 = 256 # 1st layer number of neurons
n_hidden_2 = 256 # 2nd layer number of neurons

2. MultiLayer Model

  1. Receive input data array and send it to the first hidden layer (with 256 neurons in this case). The data will have random weight attached to it between layers and then sent to a node to undergo an activation function (along with a Bias). In this example, we will be using the RELU activation function which is a very simple rectifier fucntion that returns x or zero. For our final output layer, we will use a linear activation with matrix multiplication.
  2. Then it will proceed on to the next hidden layer (and repeat step 1) and in our case (2 hidden layers) it will then reach the final output layer. The more hidden layers you use, the longer the model will take to run but it has higher chance of producing a more accurate result.
  3. Once we received the final output data, we need to evaluate it using a loss function (also called a cost function) which tells us how far off we are from the desired result.
  4. Apply an optimisation function to minimise the cost. This is done through adjusting the weight values accordingly across the network. We will be using the Adam Optimiser in this example. We can adjust how quickly we want to apply this optimisation by tweaking the learning rate parameter above. The lower the rate, the higher the possibility for accurate training results (again, comes at the cost of having to wait longer).
In [23]:
def multilayer_perceptron(x, weights, biases):
'''
    x: Placeholder for data input
    weights: Dictionary of weights
    biases: Dictionary of bias values
    '''
# First hidden layer with RELU activation
# X * W + B
layer_1 = tf.add(tf.matmul(x,weights['h1']),biases['b1'])
# RELU(X * W + B) = RELU -> f(x) = max(0,x)
layer_1 = tf.nn.relu(layer_1)
# Second hidden layer
layer_2 = tf.add(tf.matmul(layer_1,weights['h2']),biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer
out_layer = tf.matmul(layer_2,weights['out']) + biases['out']
return out_layer

Weights and Bias

In order for our tensorflow model to work we need to create two dictionaries containing our weight and bias objects for the model. We can use the tf.variable object type. This is different from a constant because TensorFlow’s Graph Object becomes aware of the states of all the variables. A Variable is a modifiable tensor that lives in TensorFlow’s graph of interacting operations. It can be used and even modified by the computation. We will generally have the model parameters be Variables.

tf.random_normal – outputs values that are normally distributed

In [25]:
weights = {
'h1':tf.Variable(tf.random_normal([n_input,n_hidden_1])),
'h2':tf.Variable(tf.random_normal([n_hidden_1,n_hidden_2])),
'out':tf.Variable(tf.random_normal([n_hidden_2,n_classes]))
}
In [26]:
weights
Out[26]:
{'h1': <tf.Variable 'Variable:0' shape=(784, 256) dtype=float32_ref>,
'h2': <tf.Variable 'Variable_1:0' shape=(256, 256) dtype=float32_ref>,
'out': <tf.Variable 'Variable_2:0' shape=(256, 10) dtype=float32_ref>}
In [27]:
biases = {
'b1':tf.Variable(tf.random_normal([n_hidden_1])),
'b2':tf.Variable(tf.random_normal([n_hidden_2])),
'out':tf.Variable(tf.random_normal([n_classes]))
}

TensorFlow Graph Input

In [28]:
x = tf.placeholder('float',[None,n_input])
y = tf.placeholder('float',[None,n_classes])

Construct model

In [29]:
pred = multilayer_perceptron(x, weights,biases)

Define our cost and optimisation functions

In [36]:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels= y))
optimiser = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

Training the Model

Next_batch() returns a tuple in the form (X,y) with an array of the data and a y array indicating the class in the form of a binary array. X is out input and y is the actual result

Run Session

In [37]:
sess = tf.InteractiveSession()

Initialisation of Variables

The first thing we do is initialise all our tf.Variable objects

In [40]:
init = tf.global_variables_initializer()
In [41]:
sess.run(init)
In [42]:
# 15 loops as we set training_epochs = 15
for epoch in range (training_epochs):
# Cost
avg_cost = 0.0
total_batch = int(n_samples/batch_size)
for i in range(total_batch):
# Grab the next batch of training data and labels
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Feed dictionary for optimization and loss value
# Returns a tuple, but we only need 'c' the cost
# So we set an underscore as a "throwaway"
_,c = sess.run([optimiser,cost],feed_dict={x:batch_x,y:batch_y})
avg_cost += c/total_batch
print("Epoch: {} cost{:.4f}".format(epoch+1,avg_cost))
print("Model has completed {} Epochs of training".format(training_epochs))
Epoch: 1 cost178.5794
Epoch: 2 cost43.1790
Epoch: 3 cost27.4076
Epoch: 4 cost19.1422
Epoch: 5 cost14.0694
Epoch: 6 cost10.5527
Epoch: 7 cost8.1026
Epoch: 8 cost6.0094
Epoch: 9 cost4.5397
Epoch: 10 cost3.4362
Epoch: 11 cost2.5782
Epoch: 12 cost1.9555
Epoch: 13 cost1.4509
Epoch: 14 cost1.1742
Epoch: 15 cost0.9349
Model has completed 15 Epochs of training

Model Evaluations

Tensorflow comes with some built-in functions to help evaluate our model, including tf.equal and tf.cast with tf.reduce_mean.

tf.equal()

This is essentially just a check of predictions == y_test. In our case since we know the format of the labels is a 1 in an array of zeroes, we can compare argmax() location of that 1.

In [44]:
correct_pred = tf.equal(tf.argmax(pred,1),tf.argmax(y,1)) # checks if x==y?

In order to get a numerical value for our predictions we will need to use tf.cast to cast the Tensor of booleans back into a Tensor of Floating point values in order to take the mean of it.

In [46]:
correct_pred = tf.cast(correct_pred,'float')

Finally, we use the tf.reduce_mean function to grab the mean of the elements across the tensor

In [49]:
accuracy = tf.reduce_mean(correct_pred)

The accuracy is still a Tensor object. We still need to pass in our actual test data!

In [50]:
mnist.test.labels
Out[50]:
array([[ 0.,  0.,  0., ...,  1.,  0.,  0.],
[ 0.,  0.,  1., ...,  0.,  0.,  0.],
[ 0.,  1.,  0., ...,  0.,  0.,  0.],
..., 
[ 0.,  0.,  0., ...,  0.,  0.,  0.],
[ 0.,  0.,  0., ...,  0.,  0.,  0.],
[ 0.,  0.,  0., ...,  0.,  0.,  0.]])
In [51]:
mnist.test.images
Out[51]:
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
[ 0.,  0.,  0., ...,  0.,  0.,  0.],
[ 0.,  0.,  0., ...,  0.,  0.,  0.],
..., 
[ 0.,  0.,  0., ...,  0.,  0.,  0.],
[ 0.,  0.,  0., ...,  0.,  0.,  0.],
[ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32)

The eval() method allows you to directly evaluates this tensor in a Session

In [52]:
accuracy.eval({x:mnist.test.images,y:mnist.test.labels})
Out[52]:
0.94459999

94% accuracy! But this actually isn’t anywhere near as good as it could be. Running for more training epochs with this data (around 20,000) can produce accuracy around 99%. But that will take a very long time to run!

Leave a Reply

Your email address will not be published. Required fields are marked *