In this guide, you will learn how to create an AI that recognizes handwriting with Python using Dense neural networks and the MNIST dataset. This guide will use TensorFlow to train your AI, and basic knowledge of linear algebra used in AI is strongly recommended. You can refer to this guide to understand the linear algebra used in AI. In the next part, we upgrade the neural network’s accuracy using convolutional neural networks.
Prerequisites
To do this, you will first need to install Python and add Pip to the .bashrc
file for Linux or the Environment Variables in Windows or Mac. Then, run the command below to install the required libraries:
pip install "tensorflow<2.11"
pip install pandas openpyxl numpy matplotlib
Note: If installing TensorFlow does not work, you can run pip install tensorflow
. This will function like normal, but it will not be able to utilize your GPU.
Writing The Code
In a new Python file, we will first import the dataset and import the libraries needed:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras import backend as K
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
We then define some functions that will help us visualize the data better later on in the code. I will not go over how they work, but they are not a necessity, just there to help us visualize the data better:
def show_min_max(array, i):
random_image = array[i]
print(random_image.min(), random_image.max())
def plot_image(array, i, labels):
plt.imshow(np.squeeze(array[i]))
plt.title(" Digit " + str(labels[i]))
plt.xticks([])
plt.yticks([])
plt.show()
def predict_image(model, x):
x = x.astype('float32')
x = x / 255.0
x = np.expand_dims(x, axis=0)
image_predict = model.predict(x, verbose=0)
print("Predicted Label: ", np.argmax(image_predict))
plt.imshow(np.squeeze(x))
plt.xticks([])
plt.yticks([])
plt.show()
return image_predict
def plot_value_array(predictions_array, true_label, h):
plt.grid(False)
plt.xticks(range(10))
plt.yticks([])
thisplot = plt.bar(range(10), predictions_array[0], color="#777777")
plt.ylim([(-1*h), h])
predicted_label = np.argmax(predictions_array)
thisplot[predicted_label].set_color('red')
thisplot[true_label].set_color('blue')
plt.show()
In the MNIST Data set (the dataset that we will be using), there are 60,000 training images and 10,000 test images. Each image is 28 x 28 pixels. There are 10 possible outputs (or to be more technical, output classes), and there is one color channel, meaning that each image is stored as a 28 x 28 grid of numbers between 0 and 255. It also means that each image is monochrome.
We can use this data to set some variables:
img_rows = 28 # Rows in each image
img_cols = 28 # Columns in each image
num_classes = 10 # Output Classes
Now, we will load the train images and labels and load in another set of images and labels used for evaluating the model’s performance after we train it (these are called test images/labels).
What Are Images and Labels?
These can also be data and labels. The data is the context that the computer is given, while the labels are the correct answer to predicting based on data. Most of the time, the model tries predicting labels based on the data it is given.
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
The next step is not required, and we don’t make use of it throughout the code, however it is recommended, especially if you are using a Python notebook.
The next step is to create a duplicate, untouched version of the train and test data as a backup:
(train_images_backup, train_labels_backup), (test_images_backup, test_labels_backup) = mnist.load_data()
Now, we test to see if we loaded the data correctly:
print((train_images.shape, test_images.shape))
((60000, 28, 28), (10000, 28, 28))
Why Are They Those Shapes?
The images are 28×28, so that explains the last two dimensions in the shape. Because the data is stored as a long matrix of pixel values (this is not readable to our neural network, by the way; we will fix this later), we do not need to add any more dimensions. If you remember what I said earlier, you will know that there are 60000 training images and 10000 testing images, so that explains the first dimension in the tensor.
The whole purpose of this tutorial is to get you comfortable with machine learning, which is why I am going to let you in on the fact that data can be formatted one way or another, and it is up to you to understand how to get your datasets to work with your model.
Because the MNIST dataset is made for this purpose, it is already ready-to-use and little to no reshaping or reformatting has to go into this.
However, you might come across data you need to use for your model that is not that well formatted or ready for your machine learning model or scenario.
It is important to develop this skill, as in your machine learning career, you are going to have to deal with different types of data.
Now, let’s do the only reshaping we really need to do, reshaping the data to fit in out neural network input layer by converting it from a long matrix of pixel values to readable images. We can do this by adding the number of color channels as a dimension, and because the image is monochrome, we only need to add one as a dimension.
What is a Shape in Neural Networks?
A shape is the size of the linear algebra object you want to represent in code. I provide an extremely simple explanation of this here.
What is a Neural Network?
A neural network is a type of AI computers use to think and learn like a human. The type of neural network that we will be using today, sequential, models the human brain, consisting of layers of neurons that pass computed data to the next layer, which passes it’s computed data to the next layer, and so on, until it finally passes through the output layer, which will narrow the possible results down to however many output classes (desired amount of possible outcomes) you want. This whole layer cycle begins at the input layer, which will take the shape and pass it through to the rest of the layers.
train_images = train_images.reshape(train_images.shape[0], img_rows, img_cols, 1)
test_images = test_images.reshape(test_images.shape[0], img_rows, img_cols, 1)
# Adding print statements to see the new shapes.
print((train_images.shape, test_images.shape))
((60000, 28, 28, 1), (10000, 28, 28, 1))
Now, we define the input shape, to be used when we define settings for the model.
What is an Input Shape?
An input shape defines the only shape that the input layer is capable of taking into the neural network.
We will begin data cleaning now, or making the data easier to process by the model.
First, let’s plot the digit 5 as represented in the MNIST dataset:
plot_image(train_images, 100, train_labels)
This should output the following plot:
Now, let’s see what the numbers representing pixel intensity look like inside the image:
out = ""
for i in range(28):
for j in range(28):
f = int(train_images[100][i][j][0])
s = "{:3d}".format(f)
out += (str(s)+" ")
print(out)
out = ""
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 2 18 46 136 136 244 255 241 103 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 15 94 163 253 253 253 253 238 218 204 35 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 131 253 253 253 253 237 200 57 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 155 246 253 247 108 65 45 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 207 253 253 230 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 157 253 253 125 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 89 253 250 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 89 253 247 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 89 253 247 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 89 253 247 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 21 231 249 34 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 225 253 231 213 213 123 16 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 172 253 253 253 253 253 190 63 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 2 116 72 124 209 253 253 141 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25 219 253 206 3 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 104 246 253 5 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 213 253 5 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 226 253 5 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 132 253 209 3 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 78 253 86 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
In order to help us visualize the data to another degree, let’s run the function below to show what the minimum and maximum values of the data are (the largest and smallest value in the data):
show_min_max(train_images, 100)
0 255
Now we can start the actual data cleaning. As you saw above, the data in the image is represented as an integer between zero and 255. While the network could learn on this data, let’s make it easier for the network by representing these values as a floating point number between zero and one. This keeps the numbers small for the neural network.
Sign up for our newsletter!
First thing’s first, let’s convert the data to a floating-point number:
train_images = train_images.astype('float32')
test_images = test_images.astype('float32')
Now that the data can be stored as a floating point number, we need to normalize the data all the way down to 0 to 1, not 0 to 255. We can achieve this by using some division:
train_images /= 255
test_images /=255
Now we can see if any changes were made to the image:
plot_image(train_images, 100, train_labels)
The code above should output:
As you could see, no changes were made to the image. Now we will run the code below to check if the data was actually normalized:
out = ""
for i in range(28):
for j in range(28):
f = (train_images[100][i][j][0])
s = "{:0.1f}".format(f)
out += (str(s)+" ")
print(out)
out = ""
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.2 0.5 0.5 1.0 1.0 0.9 0.4 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.4 0.6 1.0 1.0 1.0 1.0 0.9 0.9 0.8 0.1 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 1.0 1.0 1.0 1.0 0.9 0.8 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 1.0 1.0 1.0 0.4 0.3 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 1.0 1.0 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 1.0 1.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 1.0 1.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.9 1.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.9 1.0 0.9 0.8 0.8 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7 1.0 1.0 1.0 1.0 1.0 0.7 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.3 0.5 0.8 1.0 1.0 0.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.9 1.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.9 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 1.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
As you can see, the image is not affected, but the data is easier for the neural network to deal with.
If we don’t want to have to stifle through all those numbers but still check to see if we have cleaned the data correctly, let’s look at the minimum and maximum values of the data:
print("The min and max are: ")
show_min_max(train_images, 100)
The min and max are:
0.0 1.0
We could start building the model now, but there is a problem we need to address. MNIST’s labels are simply the digits 1 to 9 because, well, the entire dataset is just handwritten digits 1 to 9. However, due to the nature of neural networks, they inherently believe that the data is ordered (i.e. 1 is more similar to 2 than 7, when in reality 7 looks more like the number 1, but they do this because from a mathematical perspective 1 is more similar to 2), which is wrong. To do this, convert the data to a categorical format, one that Keras won’t think is ordered, making it view each number independently:
train_labels = keras.utils.to_categorical(train_labels, num_classes)
test_labels= keras.utils.to_categorical(test_labels, num_classes)
This is also called One-Hot Encoding.
Now, we can finally start building our model.
Training done on datasets are called epochs. Each epoch is one complete pass over the entire dataset. Generally speaking, most epochs yeild more accurate results, but take a longer time to train. Finding the balance between reasonable time and good results is important when developing an AI model.
For now, we are just going to be training the model with ten epochs, but this number can be adjusted as you wish.
epochs = 10
In this tutorial, we will be making a sequential model. In the future, you may need to make other types of models.
Defining our model:
model = Sequential()
Now, we need to add the first layer (also called the input layer, as it takes input):
model.add(Flatten(input_shape= (28,28,1)))
That layer is a flatten layer. It will convert the data into a long string of numbers, but in a way that the neural network can understand. We prepared the data for this earlier. Because it does not know what shape the data is stored as, we have to specify it in the input_shape
parameter.
Now, we can add the layers needed.
We will add a Dense layer below, which will perform predictions on the data. We can configure a lot here, and in the future as a machine learning engineer, you will need to learn what the proper configurations for your scenario are. For now, we are going to use the activation function ReLU
and put 16 neurons in this layer.
What is ReLU?
ReLU is an activation function that stands for Rectified Linear Unit. It uses the property of nonlinearity to properly rectify data sent through it. For example, if a negative number is passed through it, it will return 0
.
model.add(Dense(units=16, activation='relu'))
Finally, we will add the output layer. It’s job, as implied in the name, is to shrink the amount of possible outputs down to the number of output classes specified. Each output from this layer represents the AI’s guess on how likely one of its guesses is to be correct (in computer vision terms, this is known as the confidence).
We will make sure that the neural network shrinks this down to ten output classes (as the possible outputs are the digits zero to nine) by putting ten neurons into it (as you probably guessed, one neuron will output its guess on how likely it is that it’s correct), and by using the Softmax
activation function to do so.
What is Softmax?
Softmax is an activation function that distributes the outputs such that they all sum to one. We are using it as the activation function for the final layer because our neural network is outputting something that could be interpreted as probability distribution.
model.add(Dense(units=10, activation='softmax'))
Now, we can see an overview of what our model looks like:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
dense (Dense) (None, 16) 12560
dense_1 (Dense) (None, 10) 170
=================================================================
Total params: 12,730
Trainable params: 12,730
Non-trainable params: 0
_________________________________________________________________
As you saw above, our model is sequential, has three layers that reshape the data, and already has 12,730 parameters to train. This means that the network is going to change 12,730 numbers in a single epoch. This should be enough to correctly identify a hand-drawn number.
Now, we have to compile the network and provide data to TensorFlow such that it compiles in the way that we want it to.
What do All the Arguments Mean?
- The Optimizer is an algorithm that, as you probably guessed from the name, optimizes some value. Optimizing a value can mean either making it as big as possible or as small as possible. In a neural network, we want to optimize the loss (or how many times the neural network got the data wrong) by making it as small as possible. The optimizer is the function that does all this math behind the scenes. There are many functions for this, each with their own strengths or weaknesses. We will use Adam, a popular one for image recognition as it is fast and lightweight.
- The Loss is the difference between a model’s prediction and the actual label. There are many ways to calculate this, which is why it is important to choose the right one. The loss function you need varies based on the how your neural network’s output should look like. For now, we should just use Categorical Cross Entropy.
- The Metrics. For convenience purposes and to better visualize the data, TensorFlow allows the developer to choose which additional metrics it should show to supplement the metrics already shown during training. Accuracy, or what percent of input images the model guessed correctly, is one metric that can be visualized during training. It is similar to loss, but is calculated in a separate way, so accuracy and loss won’t necessarily add up to 100% or be direct inverts of each other.
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
Once our model is compiled, we can fit the model to the training data that we prepared. We will use the actual training data to train the model in a way that lets it recognize numbers.
The train_images
is the dataset that will be the inputs given to the model, while the train_labels
will be like the answer to the questions, helping us keep track of if the network’s guess was correct or not. The epochs will be the amount of epochs it needs to run. This will be set to the variable we defined earlier.
model.fit(train_images, train_labels, epochs=epochs, shuffle=True)
Epoch 1/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.4289 - accuracy: 0.8818
Epoch 2/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2530 - accuracy: 0.9291
Epoch 3/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2187 - accuracy: 0.9387
Epoch 4/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1968 - accuracy: 0.9440
Epoch 5/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1815 - accuracy: 0.9491
Epoch 6/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1687 - accuracy: 0.9514
Epoch 7/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1605 - accuracy: 0.9539
Epoch 8/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1524 - accuracy: 0.9560
Epoch 9/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1459 - accuracy: 0.9574
Epoch 10/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1402 - accuracy: 0.9590
You can notice how, as the epochs progress, the loss goes down and the accuracy goes up. This is what we want!
However, knowing the labels to all the data basically makes those metrics useless – after all, you are just giving the model an answer – so we need to evaluate the model to see how well it could really do. We can achieve this by evaluating the model on test data – data the model has never seen before.
The <model>.evaluate
function takes the testing data, as well as the trained model, and evaluates the model, producing a set of metrics (also called scores) that show how well the model really did on unforeseen data.
Although the function is taking the test labels, the function never shows this data to the neural network, only using it to grade the neural network on how well it did.
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
313/313 - 0s - loss: 0.1657 - accuracy: 0.9528 - 347ms/epoch - 1ms/step
As you saw above, both the loss and accuracy seem to be pretty low. This is because both the loss and accuracy are stored as precents in the form of decimals. This means that, for the output above, the loss is 16.57% and the accuracy is 95.28%. That is pretty good.
Using Our Model
First download this image to the same folder as the Python file, and name it test.jpg.
Now, run the code below to predict our image using <model>.predict
:
path = "test.jpg"
img = tf.keras.preprocessing.image.load_img(path, target_size=(28,28), color_mode = "grayscale")
x = tf.keras.preprocessing.image.img_to_array(img)
true_label = 3
p_arr = predict_image(model, x)
plot_value_array(p_arr, true_label, 1)
Predicted Label: 2
...
It probably got the answer wrong. This is because it’s used to inverted images, meaning light handwriting on dark paper. To do this, we simply need to invert the image colors:
x_inv = 255-x
And now we can run the prediction again:
arr = predict_image(model, x_inv)
plot_value_array(arr, 3, 1)
Predicted Label: 3
...
It probably got the answer correct. You have successfully built a neural network!
Exporting The Model
To do this, simply run the code below (which saves it to a file called my_model.h5
:
model.save('my_model.h5')
Now if you ever want to refer to it again in another file, simply load in the sequential model:
model = keras.models.load_model("my_model.h5", compile=False)
Flaws in Our Code
There are flaws in out model. Firstly, if you tried evaluating it on multiple images, you may have noticed that it was not accurate. This is because if we want it to recognize an image, we have to optimize it for that image.
Because all of the training images were white on black, it has to do a lot of guessing when it gets confused on an image that is black on white.
We can fix this with convolutional neural networks.
It recognizes the small parts and details of an image, will be much more accurate, and will be better with more general data.
Follow along for the next part, where I teach you how to optimize this with convolutional neural networks.