Optimizing AI Models Using Convolutional Neural Networks

This guide is a part two to a previous guide I made, called The Simple Guide to AI and Machine Learning With Python. This guide is simply how you can improve accuracy to the model you made in that guide, meaning that I’m going to assume you have already completed the previous guide before going on to follow this guide.

In the previous guide, we learned how you can use dense neural networks to make a program that recognizes handwriting. Well, that neural network was not exactly very accurate, as it had a tendency to get numbers wrong unless it was specifically modified for those numbers. As you probably know by now, you would probably want the neural network to recognize any number you give it without having to optimize the network for every single number that comes to it.

Convolutional neural networks were made to solve this problem. Rather than training off of the overall image, convolutional neural networks recognize tiny features in the image and learns those. For example, rather than focusing on the entire image of a hand-drawn three, the network will learn that a three has two curves that are stacked vertically, which will help it recognize any other threes in the future, no matter how it was drawn or whether the neural network was optimized for the number three.

Step One: Initial Setup

For this step, we can just use the code that we used in the previous tutorial to prepare the MNIST dataset.

Python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras import backend as K
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# helper functions
def show_min_max(array, i):
  random_image = array[i]
  print("min and max value in image: ", random_image.min(), random_image.max())


def plot_image(array, i, labels):
  plt.imshow(np.squeeze(array[i]))
  plt.title(" Digit " + str(labels[i]))
  plt.xticks([])
  plt.yticks([])
  plt.show()

def predict_image(model, x):
  x = x.astype('float32')
  x = x / 255.0

  x = np.expand_dims(x, axis=0)

  image_predict = model.predict(x, verbose=0)
  print("Predicted Label: ", np.argmax(image_predict))

  plt.imshow(np.squeeze(x))
  plt.xticks([])
  plt.yticks([])
  plt.show()
  return image_predict

img_rows, img_cols = 28, 28  

num_classes = 10 

(train_images, train_labels), (test_images, test_labels) = mnist.load_data() 
(train_images_backup, train_labels_backup), (test_images_backup, test_labels_backup) = mnist.load_data() 

print(train_images.shape) 
print(test_images.shape) 

train_images = train_images.reshape(train_images.shape[0], img_rows, img_cols, 1)
test_images = test_images.reshape(test_images.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

train_images = train_images.astype('float32')
test_images = test_images.astype('float32')

train_images /= 255
test_images /= 255

train_labels = keras.utils.to_categorical(train_labels, num_classes)
test_labels = keras.utils.to_categorical(test_labels, num_classes)

print(train_images[1232].shape)
Expected Output
(60000, 28, 28)
(10000, 28, 28)
(28, 28, 1)

Now that we have already put in the initial setup of our code, we can jump straight to creating our network.

Creating Our Network

Similar to what we did with the densely connected network, we are still going to have epochs, or the amount of times the network goes through the entire set over again.

With that explanation out of the way, we can define our model.

Python
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout

epochs =  10
model = Sequential()

Now, let’s start adding the layers of our neural network.

Explaining Convolutional Layers

With our previous network, we added three dense (fully connected) layers. With our new network that uses convolutional neural networks, the layers work differently.

Convolutional layers consist of groups of neurons called filters that move across the image and activate based on the pixels they read. Those groups will then learn how to recognize features in the data.

It is possible to adjust amount and size for filters in your neural network, which we will change to our liking. Bigger filters can observe larger parts of the image at once, while smaller filters gather finer details about the image. A higher amount of filters means that the neural network can recognize a wider range of image features.

There are many advantages of having layers and filters work this way. For one thing, smaller filters can be more computationally efficient by only examining a small part of the image at once. Furthermore, as filters are moved across the entire image, the neural network will not be affected by feature displacement (occurs when a feature is common to two images, but in different spots of an image). Just like reality, filters focus on a small area of the image, so they are not distracted by the other parts of an image.

We will be using multiple convolutional layers to complete our new-and-improved handwriting recognition software.

Implementing Convolutional Layers

When we use Keras, we can easily take advantage of its functionality to easily create convolutional layers that we will then use in our model. We will use the Conv2D function to create the first layer of out neural network.

In the case below, we will have 32 filters, a kernel size of (3,3), an input shape – which we saved to the input_shape variable when we ran the setup code at the beginning – of (28,28,1), and an activation function of ReLU. I go more in-depth into what ReLU is in my previous guide.

Python
model.add(Conv2D(filters=32, kernel_size=(3,3),activation='relu',input_shape=input_shape))

The Conv2D function creates 2D convolutional layers, meaning that they scan across flat data, like images.

Explaining Pooling Layers

When you use convolutional layers, things can get quite computationally intensive, which is where pooling layers come in. Increasing the number of neurons will increase the number of computation time required. Pooling layers are essentially filters that move in specified strides across the image, simplifying each of the filters’ contents into a single value. This, based on the size and stride of the filter, shrinks the output image.

For this scenario, we will have a 2×2 filter with a size of 2. This halves the image’s row and column count, simplifying the data without too much loss of specificity.

Python
model.add(MaxPooling2D(pool_size=(2,2)))

Most networks have at least one set of alternating convolutional and pooling layers.

More Convolutional Layers

Convolutional layers are designed to examine the low-level features of an image. If we add more, we may be able to start working with higher-level features.

We define the layer the same way we defined the previous one, but now we have 64 filters, not 32. We also do not specify the input shape, as it is inferred from the previous layer.

Python
model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu'))

Dropout Layers

Dropout layers are layers that take a percentage of all input neurons and deactivate them randomly. This forces other neurons to adapt to the task. When larger and more complicated networks lack a dropout layer, the network risks being too dependent on a single set of neurons rather than all neurons learning. This is called Overfitting and can change your network output for the worse.

Below, we will have our dropout layer take 30%, or 0.3 neurons to deactivate randomly.

Python
model.add(Dropout(rate=0.3))

Dense and Flatten Layers

After all the convolutional and pooling layers, we will need a layer to help make our final decision. This will be a regular, fully connected dense layer. Before we connect this layer, we will need to flatten the image’s filters.

We can start by flattening the image using the Keras Flatten layer.

Python
model.add(Flatten())

Now, we can add a dense layer with ReLU activation and 32 neurons.

Python
model.add(Dense(units=32,activation='relu'))

Sign up for our newsletter!

Output Layers

Similar to the fully connected neural network we made in the previous guide, we will need a layer to shrink the previous dense layer down to just the number of classes. Also similar to before, the final output is decided by using the class with the highest weight.

Below, we will add a dense layer to be our output layer. The number of neurons should be 10 because there are ten possible output classes, and the activation should use Softmax.

Python
model.add(Dense(units=10,activation='softmax'))

Model Summary

Now, we can print out our model summary:

Python
model.summary()
Expected Output (Lines Providing no Useful Data are Blurred)
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2  (None, 13, 13, 32)        0         
 D)                                                              
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 dropout (Dropout)           (None, 11, 11, 64)        0         
                                                                 
 conv2d_2 (Conv2D)           (None, 9, 9, 32)          18464     
                                                                 
 flatten (Flatten)           (None, 2592)              0         
                                                                 
 dense (Dense)               (None, 32)                82976     
                                                                 
 dense_1 (Dense)             (None, 10)                330       
                                                                 
=================================================================
Total params: 120586 (471.04 KB)
Trainable params: 120586 (471.04 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Compiling and Training

Now we will compile the network. The loss and metric will be the same as the ones that we use in the previous guide, Categorical Cross Entropy and accuracy respectively. However, we will use RMSProp (Root Mean Squared Propagation) as our training algorithm. RMSProp is one of many training algorithms that Keras can use to teach the network how to actually improve, optimizing the loss to make it as small as possible. We will achieve this using RMSProp.

Python
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',  metrics=['accuracy'])

Now, we can start training.

The fit function is the one that actually does the training.

Now we can look at the parameters of the training function.

  • train_images and train_labels state the data that this neural network model will be trained on. The images are the pieces of data given to the network, and the network tries to find out the appropriate label
  • batch_size allows us to put the network’s data into batches. We can always change it later, but for now we have set it to 64
  • epochs defines the number of epochs (times the network reiterates on the training data) the network should use
  • validation_data defines the data the model is testing itself on
  • We have turned shuffle on so Keras shuffles the training data after every epoch and isn’t relying on the order of the data to train on
Python
model.fit(train_images, train_labels, batch_size=64, epochs=epochs, validation_data=(test_images, test_labels), shuffle=True)
Expected Output (may vary)
Epoch 1/10
938/938 [==============================] - 23s 24ms/step - loss: 0.1677 - accuracy: 0.9473 - val_loss: 0.0501 - val_accuracy: 0.9832
Epoch 2/10
938/938 [==============================] - 23s 24ms/step - loss: 0.0512 - accuracy: 0.9841 - val_loss: 0.0331 - val_accuracy: 0.9885
Epoch 3/10
938/938 [==============================] - 22s 23ms/step - loss: 0.0354 - accuracy: 0.9894 - val_loss: 0.0347 - val_accuracy: 0.9894
Epoch 4/10
938/938 [==============================] - 22s 24ms/step - loss: 0.0283 - accuracy: 0.9918 - val_loss: 0.0349 - val_accuracy: 0.9879
Epoch 5/10
938/938 [==============================] - 22s 23ms/step - loss: 0.0228 - accuracy: 0.9928 - val_loss: 0.0271 - val_accuracy: 0.9911
Epoch 6/10
938/938 [==============================] - 22s 24ms/step - loss: 0.0199 - accuracy: 0.9938 - val_loss: 0.0273 - val_accuracy: 0.9909
Epoch 7/10
938/938 [==============================] - 22s 23ms/step - loss: 0.0155 - accuracy: 0.9953 - val_loss: 0.0299 - val_accuracy: 0.9904
Epoch 8/10
938/938 [==============================] - 22s 24ms/step - loss: 0.0140 - accuracy: 0.9956 - val_loss: 0.0321 - val_accuracy: 0.9911
Epoch 9/10
938/938 [==============================] - 22s 24ms/step - loss: 0.0120 - accuracy: 0.9960 - val_loss: 0.0387 - val_accuracy: 0.9905
Epoch 10/10
938/938 [==============================] - 23s 25ms/step - loss: 0.0112 - accuracy: 0.9968 - val_loss: 0.0334 - val_accuracy: 0.9918

Model Evaluation

Now, we have to test the model on data it hasn’t seen yet. To do this, we will use the evaluate function. Loss and accuracy are percentages returned in decimal format.

Python
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
Expected Output (may vary)
313/313 - 1s - loss: 0.0334 - accuracy: 0.9918 - 886ms/epoch - 3ms/step

In our case above, the accuracy was 99.18%, which is pretty good.

Exporting Our Model

Now, we can export the model to be used elsewhere. We can do this by using model.save.

Python
model.save('cnn_model.h5')

This will save the model to a file called “cnn_model.h5”, where it can then be loaded in other pieces of code.