Tag: AI

  • Optimizing AI Models Using Convolutional Neural Networks

    This guide is a part two to a previous guide I made, called The Simple Guide to AI and Machine Learning With Python. This guide is simply how you can improve accuracy to the model you made in that guide, meaning that I’m going to assume you have already completed the previous guide before going on to follow this guide.

    In the previous guide, we learned how you can use dense neural networks to make a program that recognizes handwriting. Well, that neural network was not exactly very accurate, as it had a tendency to get numbers wrong unless it was specifically modified for those numbers. As you probably know by now, you would probably want the neural network to recognize any number you give it without having to optimize the network for every single number that comes to it.

    Convolutional neural networks were made to solve this problem. Rather than training off of the overall image, convolutional neural networks recognize tiny features in the image and learns those. For example, rather than focusing on the entire image of a hand-drawn three, the network will learn that a three has two curves that are stacked vertically, which will help it recognize any other threes in the future, no matter how it was drawn or whether the neural network was optimized for the number three.

    Step One: Initial Setup

    For this step, we can just use the code that we used in the previous tutorial to prepare the MNIST dataset.

    Python
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras.datasets import mnist
    from tensorflow.keras import backend as K
    import numpy as np
    import matplotlib.pyplot as plt
    %matplotlib inline
    
    # helper functions
    def show_min_max(array, i):
      random_image = array[i]
      print("min and max value in image: ", random_image.min(), random_image.max())
    
    
    def plot_image(array, i, labels):
      plt.imshow(np.squeeze(array[i]))
      plt.title(" Digit " + str(labels[i]))
      plt.xticks([])
      plt.yticks([])
      plt.show()
    
    def predict_image(model, x):
      x = x.astype('float32')
      x = x / 255.0
    
      x = np.expand_dims(x, axis=0)
    
      image_predict = model.predict(x, verbose=0)
      print("Predicted Label: ", np.argmax(image_predict))
    
      plt.imshow(np.squeeze(x))
      plt.xticks([])
      plt.yticks([])
      plt.show()
      return image_predict
    
    img_rows, img_cols = 28, 28  
    
    num_classes = 10 
    
    (train_images, train_labels), (test_images, test_labels) = mnist.load_data() 
    (train_images_backup, train_labels_backup), (test_images_backup, test_labels_backup) = mnist.load_data() 
    
    print(train_images.shape) 
    print(test_images.shape) 
    
    train_images = train_images.reshape(train_images.shape[0], img_rows, img_cols, 1)
    test_images = test_images.reshape(test_images.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)
    
    train_images = train_images.astype('float32')
    test_images = test_images.astype('float32')
    
    train_images /= 255
    test_images /= 255
    
    train_labels = keras.utils.to_categorical(train_labels, num_classes)
    test_labels = keras.utils.to_categorical(test_labels, num_classes)
    
    print(train_images[1232].shape)
    Expected Output
    (60000, 28, 28)
    (10000, 28, 28)
    (28, 28, 1)

    Now that we have already put in the initial setup of our code, we can jump straight to creating our network.

    Creating Our Network

    Similar to what we did with the densely connected network, we are still going to have epochs, or the amount of times the network goes through the entire set over again.

    With that explanation out of the way, we can define our model.

    Python
    from tensorflow.keras.models import Sequential 
    from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout
    
    epochs =  10
    model = Sequential()

    Now, let’s start adding the layers of our neural network.

    Explaining Convolutional Layers

    With our previous network, we added three dense (fully connected) layers. With our new network that uses convolutional neural networks, the layers work differently.

    Convolutional layers consist of groups of neurons called filters that move across the image and activate based on the pixels they read. Those groups will then learn how to recognize features in the data.

    It is possible to adjust amount and size for filters in your neural network, which we will change to our liking. Bigger filters can observe larger parts of the image at once, while smaller filters gather finer details about the image. A higher amount of filters means that the neural network can recognize a wider range of image features.

    There are many advantages of having layers and filters work this way. For one thing, smaller filters can be more computationally efficient by only examining a small part of the image at once. Furthermore, as filters are moved across the entire image, the neural network will not be affected by feature displacement (occurs when a feature is common to two images, but in different spots of an image). Just like reality, filters focus on a small area of the image, so they are not distracted by the other parts of an image.

    We will be using multiple convolutional layers to complete our new-and-improved handwriting recognition software.

    Implementing Convolutional Layers

    When we use Keras, we can easily take advantage of its functionality to easily create convolutional layers that we will then use in our model. We will use the Conv2D function to create the first layer of out neural network.

    In the case below, we will have 32 filters, a kernel size of (3,3), an input shape – which we saved to the input_shape variable when we ran the setup code at the beginning – of (28,28,1), and an activation function of ReLU. I go more in-depth into what ReLU is in my previous guide.

    Python
    model.add(Conv2D(filters=32, kernel_size=(3,3),activation='relu',input_shape=input_shape))

    The Conv2D function creates 2D convolutional layers, meaning that they scan across flat data, like images.

    Explaining Pooling Layers

    When you use convolutional layers, things can get quite computationally intensive, which is where pooling layers come in. Increasing the number of neurons will increase the number of computation time required. Pooling layers are essentially filters that move in specified strides across the image, simplifying each of the filters’ contents into a single value. This, based on the size and stride of the filter, shrinks the output image.

    For this scenario, we will have a 2×2 filter with a size of 2. This halves the image’s row and column count, simplifying the data without too much loss of specificity.

    Python
    model.add(MaxPooling2D(pool_size=(2,2)))

    Most networks have at least one set of alternating convolutional and pooling layers.

    More Convolutional Layers

    Convolutional layers are designed to examine the low-level features of an image. If we add more, we may be able to start working with higher-level features.

    We define the layer the same way we defined the previous one, but now we have 64 filters, not 32. We also do not specify the input shape, as it is inferred from the previous layer.

    Python
    model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu'))

    Dropout Layers

    Dropout layers are layers that take a percentage of all input neurons and deactivate them randomly. This forces other neurons to adapt to the task. When larger and more complicated networks lack a dropout layer, the network risks being too dependent on a single set of neurons rather than all neurons learning. This is called Overfitting and can change your network output for the worse.

    Below, we will have our dropout layer take 30%, or 0.3 neurons to deactivate randomly.

    Python
    model.add(Dropout(rate=0.3))

    Dense and Flatten Layers

    After all the convolutional and pooling layers, we will need a layer to help make our final decision. This will be a regular, fully connected dense layer. Before we connect this layer, we will need to flatten the image’s filters.

    We can start by flattening the image using the Keras Flatten layer.

    Python
    model.add(Flatten())

    Now, we can add a dense layer with ReLU activation and 32 neurons.

    Python
    model.add(Dense(units=32,activation='relu'))

    Sign up for our newsletter!

    Output Layers

    Similar to the fully connected neural network we made in the previous guide, we will need a layer to shrink the previous dense layer down to just the number of classes. Also similar to before, the final output is decided by using the class with the highest weight.

    Below, we will add a dense layer to be our output layer. The number of neurons should be 10 because there are ten possible output classes, and the activation should use Softmax.

    Python
    model.add(Dense(units=10,activation='softmax'))

    Model Summary

    Now, we can print out our model summary:

    Python
    model.summary()
    Expected Output (Lines Providing no Useful Data are Blurred)
    Model: "sequential"
    _________________________________________________________________
     Layer (type)                Output Shape              Param #   
    =================================================================
     conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                     
     max_pooling2d (MaxPooling2  (None, 13, 13, 32)        0         
     D)                                                              
                                                                     
     conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                     
     dropout (Dropout)           (None, 11, 11, 64)        0         
                                                                     
     conv2d_2 (Conv2D)           (None, 9, 9, 32)          18464     
                                                                     
     flatten (Flatten)           (None, 2592)              0         
                                                                     
     dense (Dense)               (None, 32)                82976     
                                                                     
     dense_1 (Dense)             (None, 10)                330       
                                                                     
    =================================================================
    Total params: 120586 (471.04 KB)
    Trainable params: 120586 (471.04 KB)
    Non-trainable params: 0 (0.00 Byte)
    _________________________________________________________________

    Compiling and Training

    Now we will compile the network. The loss and metric will be the same as the ones that we use in the previous guide, Categorical Cross Entropy and accuracy respectively. However, we will use RMSProp (Root Mean Squared Propagation) as our training algorithm. RMSProp is one of many training algorithms that Keras can use to teach the network how to actually improve, optimizing the loss to make it as small as possible. We will achieve this using RMSProp.

    Python
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop',  metrics=['accuracy'])

    Now, we can start training.

    The fit function is the one that actually does the training.

    Now we can look at the parameters of the training function.

    • train_images and train_labels state the data that this neural network model will be trained on. The images are the pieces of data given to the network, and the network tries to find out the appropriate label
    • batch_size allows us to put the network’s data into batches. We can always change it later, but for now we have set it to 64
    • epochs defines the number of epochs (times the network reiterates on the training data) the network should use
    • validation_data defines the data the model is testing itself on
    • We have turned shuffle on so Keras shuffles the training data after every epoch and isn’t relying on the order of the data to train on
    Python
    model.fit(train_images, train_labels, batch_size=64, epochs=epochs, validation_data=(test_images, test_labels), shuffle=True)
    Expected Output (may vary)
    Epoch 1/10
    938/938 [==============================] - 23s 24ms/step - loss: 0.1677 - accuracy: 0.9473 - val_loss: 0.0501 - val_accuracy: 0.9832
    Epoch 2/10
    938/938 [==============================] - 23s 24ms/step - loss: 0.0512 - accuracy: 0.9841 - val_loss: 0.0331 - val_accuracy: 0.9885
    Epoch 3/10
    938/938 [==============================] - 22s 23ms/step - loss: 0.0354 - accuracy: 0.9894 - val_loss: 0.0347 - val_accuracy: 0.9894
    Epoch 4/10
    938/938 [==============================] - 22s 24ms/step - loss: 0.0283 - accuracy: 0.9918 - val_loss: 0.0349 - val_accuracy: 0.9879
    Epoch 5/10
    938/938 [==============================] - 22s 23ms/step - loss: 0.0228 - accuracy: 0.9928 - val_loss: 0.0271 - val_accuracy: 0.9911
    Epoch 6/10
    938/938 [==============================] - 22s 24ms/step - loss: 0.0199 - accuracy: 0.9938 - val_loss: 0.0273 - val_accuracy: 0.9909
    Epoch 7/10
    938/938 [==============================] - 22s 23ms/step - loss: 0.0155 - accuracy: 0.9953 - val_loss: 0.0299 - val_accuracy: 0.9904
    Epoch 8/10
    938/938 [==============================] - 22s 24ms/step - loss: 0.0140 - accuracy: 0.9956 - val_loss: 0.0321 - val_accuracy: 0.9911
    Epoch 9/10
    938/938 [==============================] - 22s 24ms/step - loss: 0.0120 - accuracy: 0.9960 - val_loss: 0.0387 - val_accuracy: 0.9905
    Epoch 10/10
    938/938 [==============================] - 23s 25ms/step - loss: 0.0112 - accuracy: 0.9968 - val_loss: 0.0334 - val_accuracy: 0.9918

    Model Evaluation

    Now, we have to test the model on data it hasn’t seen yet. To do this, we will use the evaluate function. Loss and accuracy are percentages returned in decimal format.

    Python
    test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
    Expected Output (may vary)
    313/313 - 1s - loss: 0.0334 - accuracy: 0.9918 - 886ms/epoch - 3ms/step

    In our case above, the accuracy was 99.18%, which is pretty good.

    Exporting Our Model

    Now, we can export the model to be used elsewhere. We can do this by using model.save.

    Python
    model.save('cnn_model.h5')

    This will save the model to a file called “cnn_model.h5”, where it can then be loaded in other pieces of code.

  • The Simple Guide to AI and Machine Learning With Python

    In this guide, you will learn how to create an AI that recognizes handwriting with Python using Dense neural networks and the MNIST dataset. This guide will use TensorFlow to train your AI, and basic knowledge of linear algebra used in AI is strongly recommended. You can refer to this guide to understand the linear algebra used in AI. In the next part, we upgrade the neural network’s accuracy using convolutional neural networks.

    Prerequisites

    To do this, you will first need to install Python and add Pip to the .bashrc file for Linux or the Environment Variables in Windows or Mac. Then, run the command below to install the required libraries:

    BAT (Batchfile)
    pip install "tensorflow<2.11"
    pip install pandas openpyxl numpy matplotlib

    Note: If installing TensorFlow does not work, you can run pip install tensorflow. This will function like normal, but it will not be able to utilize your GPU.

    Writing The Code

    In a new Python file, we will first import the dataset and import the libraries needed:

    Python
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras.datasets import mnist
    from tensorflow.keras import backend as K
    import numpy as np
    import matplotlib.pyplot as plt
    from tensorflow.keras.models import Sequential 
    from tensorflow.keras.layers import Dense, Flatten

    We then define some functions that will help us visualize the data better later on in the code. I will not go over how they work, but they are not a necessity, just there to help us visualize the data better:

    Python
    def show_min_max(array, i):
      random_image = array[i]
      print(random_image.min(), random_image.max())
    
    def plot_image(array, i, labels):
      plt.imshow(np.squeeze(array[i]))
      plt.title(" Digit " + str(labels[i]))
      plt.xticks([])
      plt.yticks([])
      plt.show()
      
    def predict_image(model, x):
      x = x.astype('float32')
      x = x / 255.0
    
      x = np.expand_dims(x, axis=0)
    
      image_predict = model.predict(x, verbose=0)
      print("Predicted Label: ", np.argmax(image_predict))
    
      plt.imshow(np.squeeze(x))
      plt.xticks([])
      plt.yticks([])
      plt.show()
      return image_predict
      
    
    def plot_value_array(predictions_array, true_label, h):
      plt.grid(False)
      plt.xticks(range(10))
      plt.yticks([])
      thisplot = plt.bar(range(10), predictions_array[0], color="#777777")
      plt.ylim([(-1*h), h])
      predicted_label = np.argmax(predictions_array)
      thisplot[predicted_label].set_color('red')
      thisplot[true_label].set_color('blue')
      plt.show()

    In the MNIST Data set (the dataset that we will be using), there are 60,000 training images and 10,000 test images. Each image is 28 x 28 pixels. There are 10 possible outputs (or to be more technical, output classes), and there is one color channel, meaning that each image is stored as a 28 x 28 grid of numbers between 0 and 255. It also means that each image is monochrome.

    We can use this data to set some variables:

    Python
    img_rows = 28 # Rows in each image
    img_cols = 28 # Columns in each image
    num_classes = 10 # Output Classes

    Now, we will load the train images and labels and load in another set of images and labels used for evaluating the model’s performance after we train it (these are called test images/labels).

    What Are Images and Labels?

    These can also be data and labels. The data is the context that the computer is given, while the labels are the correct answer to predicting based on data. Most of the time, the model tries predicting labels based on the data it is given.

    Python
    (train_images, train_labels), (test_images, test_labels) = mnist.load_data()

    The next step is not required, and we don’t make use of it throughout the code, however it is recommended, especially if you are using a Python notebook.

    The next step is to create a duplicate, untouched version of the train and test data as a backup:

    Python
    (train_images_backup, train_labels_backup), (test_images_backup, test_labels_backup) = mnist.load_data()

    Now, we test to see if we loaded the data correctly:

    Python
    print((train_images.shape, test_images.shape))
    Expected Output
    ((60000, 28, 28), (10000, 28, 28))
    Why Are They Those Shapes?

    The images are 28×28, so that explains the last two dimensions in the shape. Because the data is stored as a long matrix of pixel values (this is not readable to our neural network, by the way; we will fix this later), we do not need to add any more dimensions. If you remember what I said earlier, you will know that there are 60000 training images and 10000 testing images, so that explains the first dimension in the tensor.

    The whole purpose of this tutorial is to get you comfortable with machine learning, which is why I am going to let you in on the fact that data can be formatted one way or another, and it is up to you to understand how to get your datasets to work with your model.

    Because the MNIST dataset is made for this purpose, it is already ready-to-use and little to no reshaping or reformatting has to go into this.

    However, you might come across data you need to use for your model that is not that well formatted or ready for your machine learning model or scenario.

    It is important to develop this skill, as in your machine learning career, you are going to have to deal with different types of data.

    Now, let’s do the only reshaping we really need to do, reshaping the data to fit in out neural network input layer by converting it from a long matrix of pixel values to readable images. We can do this by adding the number of color channels as a dimension, and because the image is monochrome, we only need to add one as a dimension.

    What is a Shape in Neural Networks?

    A shape is the size of the linear algebra object you want to represent in code. I provide an extremely simple explanation of this here.

    What is a Neural Network?

    A neural network is a type of AI computers use to think and learn like a human. The type of neural network that we will be using today, sequential, models the human brain, consisting of layers of neurons that pass computed data to the next layer, which passes it’s computed data to the next layer, and so on, until it finally passes through the output layer, which will narrow the possible results down to however many output classes (desired amount of possible outcomes) you want. This whole layer cycle begins at the input layer, which will take the shape and pass it through to the rest of the layers.

    Python
    train_images = train_images.reshape(train_images.shape[0], img_rows, img_cols, 1)
    test_images = test_images.reshape(test_images.shape[0], img_rows, img_cols, 1)
    # Adding print statements to see the new shapes.
    print((train_images.shape, test_images.shape))
    Expected Output
    ((60000, 28, 28, 1), (10000, 28, 28, 1))

    Now, we define the input shape, to be used when we define settings for the model.

    What is an Input Shape?

    An input shape defines the only shape that the input layer is capable of taking into the neural network.

    We will begin data cleaning now, or making the data easier to process by the model.

    First, let’s plot the digit 5 as represented in the MNIST dataset:

    Python
    plot_image(train_images, 100, train_labels)

    This should output the following plot:

    Now, let’s see what the numbers representing pixel intensity look like inside the image:

    Python
    out = ""
    for i in range(28):
      for j in range(28):
        f = int(train_images[100][i][j][0])
        s = "{:3d}".format(f)
        out += (str(s)+" ")
      print(out)
      out = ""
    Expected Output (Lines Providing no Useful Data are Blurred)
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   2  18  46 136 136 244 255 241 103   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0  15  94 163 253 253 253 253 238 218 204  35   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0 131 253 253 253 253 237 200  57   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0 155 246 253 247 108  65  45   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0 207 253 253 230   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0 157 253 253 125   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0  89 253 250  57   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0  89 253 247   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0  89 253 247   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0  89 253 247   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0  21 231 249  34   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0 225 253 231 213 213 123  16   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0 172 253 253 253 253 253 190  63   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   2 116  72 124 209 253 253 141   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  25 219 253 206   3   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 104 246 253   5   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 213 253   5   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  26 226 253   5   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 132 253 209   3   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  78 253  86   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 

    In order to help us visualize the data to another degree, let’s run the function below to show what the minimum and maximum values of the data are (the largest and smallest value in the data):

    Python
    show_min_max(train_images, 100)
    Expected Output
    0 255

    Now we can start the actual data cleaning. As you saw above, the data in the image is represented as an integer between zero and 255. While the network could learn on this data, let’s make it easier for the network by representing these values as a floating point number between zero and one. This keeps the numbers small for the neural network.

    Sign up for our newsletter!

    First thing’s first, let’s convert the data to a floating-point number:

    Python
    train_images = train_images.astype('float32')
    test_images = test_images.astype('float32')

    Now that the data can be stored as a floating point number, we need to normalize the data all the way down to 0 to 1, not 0 to 255. We can achieve this by using some division:

    Python
    train_images /= 255 
    test_images /=255

    Now we can see if any changes were made to the image:

    Python
    plot_image(train_images, 100, train_labels)

    The code above should output:

    As you could see, no changes were made to the image. Now we will run the code below to check if the data was actually normalized:

    Python
    out = ""
    for i in range(28):
      for j in range(28):
        f = (train_images[100][i][j][0])
        s = "{:0.1f}".format(f)
        out += (str(s)+" ")
      print(out)
      out = ""
    Expected Output (Lines Providing no Useful Data are Blurred)
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.2 0.5 0.5 1.0 1.0 0.9 0.4 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.4 0.6 1.0 1.0 1.0 1.0 0.9 0.9 0.8 0.1 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 1.0 1.0 1.0 1.0 0.9 0.8 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 1.0 1.0 1.0 0.4 0.3 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 1.0 1.0 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 1.0 1.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 1.0 1.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.9 1.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.9 1.0 0.9 0.8 0.8 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7 1.0 1.0 1.0 1.0 1.0 0.7 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.3 0.5 0.8 1.0 1.0 0.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.9 1.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.9 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 1.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

    As you can see, the image is not affected, but the data is easier for the neural network to deal with.

    If we don’t want to have to stifle through all those numbers but still check to see if we have cleaned the data correctly, let’s look at the minimum and maximum values of the data:

    Python
    print("The min and max are: ")
    show_min_max(train_images, 100)
    Expected Output (Lines Providing no Useful Data are Blurred)
    The min and max are: 
    0.0 1.0

    We could start building the model now, but there is a problem we need to address. MNIST’s labels are simply the digits 1 to 9 because, well, the entire dataset is just handwritten digits 1 to 9. However, due to the nature of neural networks, they inherently believe that the data is ordered (i.e. 1 is more similar to 2 than 7, when in reality 7 looks more like the number 1, but they do this because from a mathematical perspective 1 is more similar to 2), which is wrong. To do this, convert the data to a categorical format, one that Keras won’t think is ordered, making it view each number independently:

    Python
    train_labels = keras.utils.to_categorical(train_labels, num_classes) 
    test_labels= keras.utils.to_categorical(test_labels, num_classes)

    This is also called One-Hot Encoding.

    Now, we can finally start building our model.

    Training done on datasets are called epochs. Each epoch is one complete pass over the entire dataset. Generally speaking, most epochs yeild more accurate results, but take a longer time to train. Finding the balance between reasonable time and good results is important when developing an AI model.

    For now, we are just going to be training the model with ten epochs, but this number can be adjusted as you wish.

    Python
    epochs = 10

    In this tutorial, we will be making a sequential model. In the future, you may need to make other types of models.

    Defining our model:

    Python
    model = Sequential()

    Now, we need to add the first layer (also called the input layer, as it takes input):

    Python
    model.add(Flatten(input_shape= (28,28,1)))

    That layer is a flatten layer. It will convert the data into a long string of numbers, but in a way that the neural network can understand. We prepared the data for this earlier. Because it does not know what shape the data is stored as, we have to specify it in the input_shape parameter.

    Now, we can add the layers needed.

    We will add a Dense layer below, which will perform predictions on the data. We can configure a lot here, and in the future as a machine learning engineer, you will need to learn what the proper configurations for your scenario are. For now, we are going to use the activation function ReLU and put 16 neurons in this layer.

    What is ReLU?

    ReLU is an activation function that stands for Rectified Linear Unit. It uses the property of nonlinearity to properly rectify data sent through it. For example, if a negative number is passed through it, it will return 0.

    Python
    model.add(Dense(units=16, activation='relu'))

    Finally, we will add the output layer. It’s job, as implied in the name, is to shrink the amount of possible outputs down to the number of output classes specified. Each output from this layer represents the AI’s guess on how likely one of its guesses is to be correct (in computer vision terms, this is known as the confidence).

    We will make sure that the neural network shrinks this down to ten output classes (as the possible outputs are the digits zero to nine) by putting ten neurons into it (as you probably guessed, one neuron will output its guess on how likely it is that it’s correct), and by using the Softmax activation function to do so.

    What is Softmax?

    Softmax is an activation function that distributes the outputs such that they all sum to one. We are using it as the activation function for the final layer because our neural network is outputting something that could be interpreted as probability distribution.

    Python
    model.add(Dense(units=10, activation='softmax'))

    Now, we can see an overview of what our model looks like:

    Python
    model.summary()
    Expected Output (Lines Providing no Useful Data are Blurred)
    Model: "sequential"
    _________________________________________________________________
     Layer (type)                Output Shape              Param #   
    =================================================================
     flatten (Flatten)           (None, 784)               0         
                                                                     
     dense (Dense)               (None, 16)                12560     
                                                                     
     dense_1 (Dense)             (None, 10)                170       
                                                                     
    =================================================================
    Total params: 12,730
    Trainable params: 12,730
    Non-trainable params: 0
    _________________________________________________________________

    As you saw above, our model is sequential, has three layers that reshape the data, and already has 12,730 parameters to train. This means that the network is going to change 12,730 numbers in a single epoch. This should be enough to correctly identify a hand-drawn number.

    Now, we have to compile the network and provide data to TensorFlow such that it compiles in the way that we want it to.

    What do All the Arguments Mean?
    • The Optimizer is an algorithm that, as you probably guessed from the name, optimizes some value. Optimizing a value can mean either making it as big as possible or as small as possible. In a neural network, we want to optimize the loss (or how many times the neural network got the data wrong) by making it as small as possible. The optimizer is the function that does all this math behind the scenes. There are many functions for this, each with their own strengths or weaknesses. We will use Adam, a popular one for image recognition as it is fast and lightweight.
    • The Loss is the difference between a model’s prediction and the actual label. There are many ways to calculate this, which is why it is important to choose the right one. The loss function you need varies based on the how your neural network’s output should look like. For now, we should just use Categorical Cross Entropy.
    • The Metrics. For convenience purposes and to better visualize the data, TensorFlow allows the developer to choose which additional metrics it should show to supplement the metrics already shown during training. Accuracy, or what percent of input images the model guessed correctly, is one metric that can be visualized during training. It is similar to loss, but is calculated in a separate way, so accuracy and loss won’t necessarily add up to 100% or be direct inverts of each other.
    Python
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

    Once our model is compiled, we can fit the model to the training data that we prepared. We will use the actual training data to train the model in a way that lets it recognize numbers.

    The train_images is the dataset that will be the inputs given to the model, while the train_labels will be like the answer to the questions, helping us keep track of if the network’s guess was correct or not. The epochs will be the amount of epochs it needs to run. This will be set to the variable we defined earlier.

    Python
    model.fit(train_images, train_labels, epochs=epochs, shuffle=True)
    Expected Output (may vary)
    Epoch 1/10
    1875/1875 [==============================] - 2s 1ms/step - loss: 0.4289 - accuracy: 0.8818
    Epoch 2/10
    1875/1875 [==============================] - 2s 1ms/step - loss: 0.2530 - accuracy: 0.9291
    Epoch 3/10
    1875/1875 [==============================] - 2s 1ms/step - loss: 0.2187 - accuracy: 0.9387
    Epoch 4/10
    1875/1875 [==============================] - 2s 1ms/step - loss: 0.1968 - accuracy: 0.9440
    Epoch 5/10
    1875/1875 [==============================] - 2s 1ms/step - loss: 0.1815 - accuracy: 0.9491
    Epoch 6/10
    1875/1875 [==============================] - 2s 1ms/step - loss: 0.1687 - accuracy: 0.9514
    Epoch 7/10
    1875/1875 [==============================] - 2s 1ms/step - loss: 0.1605 - accuracy: 0.9539
    Epoch 8/10
    1875/1875 [==============================] - 2s 1ms/step - loss: 0.1524 - accuracy: 0.9560
    Epoch 9/10
    1875/1875 [==============================] - 2s 1ms/step - loss: 0.1459 - accuracy: 0.9574
    Epoch 10/10
    1875/1875 [==============================] - 2s 1ms/step - loss: 0.1402 - accuracy: 0.9590

    You can notice how, as the epochs progress, the loss goes down and the accuracy goes up. This is what we want!

    However, knowing the labels to all the data basically makes those metrics useless – after all, you are just giving the model an answer – so we need to evaluate the model to see how well it could really do. We can achieve this by evaluating the model on test data – data the model has never seen before.

    The <model>.evaluate function takes the testing data, as well as the trained model, and evaluates the model, producing a set of metrics (also called scores) that show how well the model really did on unforeseen data.

    Although the function is taking the test labels, the function never shows this data to the neural network, only using it to grade the neural network on how well it did.

    Python
    test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
    Expected Output (may vary)
    313/313 - 0s - loss: 0.1657 - accuracy: 0.9528 - 347ms/epoch - 1ms/step

    As you saw above, both the loss and accuracy seem to be pretty low. This is because both the loss and accuracy are stored as precents in the form of decimals. This means that, for the output above, the loss is 16.57% and the accuracy is 95.28%. That is pretty good.

    Using Our Model

    First download this image to the same folder as the Python file, and name it test.jpg.

    Now, run the code below to predict our image using <model>.predict:

    Python
    path = "test.jpg"
    
    img = tf.keras.preprocessing.image.load_img(path, target_size=(28,28), color_mode = "grayscale")
    x = tf.keras.preprocessing.image.img_to_array(img)
    true_label = 3
    p_arr = predict_image(model, x)
    plot_value_array(p_arr, true_label, 1)
    Expected Output (may vary)
    Predicted Label: 2
    ...

    It probably got the answer wrong. This is because it’s used to inverted images, meaning light handwriting on dark paper. To do this, we simply need to invert the image colors:

    Python
    x_inv = 255-x

    And now we can run the prediction again:

    Python
    arr = predict_image(model, x_inv)
    plot_value_array(arr, 3, 1)
    Expected Output (may vary)
    Predicted Label: 3
    ...

    It probably got the answer correct. You have successfully built a neural network!

    Exporting The Model

    To do this, simply run the code below (which saves it to a file called my_model.h5:

    Python
    model.save('my_model.h5')

    Now if you ever want to refer to it again in another file, simply load in the sequential model:

    Python
    model = keras.models.load_model("my_model.h5", compile=False)

    Flaws in Our Code

    There are flaws in out model. Firstly, if you tried evaluating it on multiple images, you may have noticed that it was not accurate. This is because if we want it to recognize an image, we have to optimize it for that image.

    Because all of the training images were white on black, it has to do a lot of guessing when it gets confused on an image that is black on white.

    We can fix this with convolutional neural networks.

    It recognizes the small parts and details of an image, will be much more accurate, and will be better with more general data.

    Follow along for the next part, where I teach you how to optimize this with convolutional neural networks.

  • Tensors Dimensions and Basics in Python Artificial Intelligence and Machine Learning

    In PyTorch and TensorFlow, Tensors are a very popular way of storing large amounts of data in artificial intelligence projects. Here, I will show you what they are, and how they work.

    What makes a Tensor?

    Tensors are made up of Scalars, Vectors, and Matrixes. Scalars are single numbers. Vectors are a line of numbers, and Matrixes are, as the name suggests, Matrixes, or tables, of numbers.

    Here is an example: If you are making an image, you can think of Matrixes as images, Scalars as pixels or dots, and Vectors like rows. You can think of Tensors as a Matrix that contains Matrixes.

    Yellow: Main tensor

    Red: Matrix 1

    Cyan/Light Blue: Matrix 2

    Orange: Vectors

    Green: Scalars

    Matrix dimension

    Matrixes are tables of numbers, so the number of rows and columns in the matrix is the matrix dimension. Below is an example.

    12
    34

    There are two rows and two columns in this table of numbers or matrix, so the dimensions of this matrix are two by two. Below is another example.

    1234
    1234
    1234
    1234

    What I showed you had four rows and columns, so the matrix above is a four-by-four matrix.

    Tensor Dimension

    Tensor dimensions are made up of three things. Earlier in this post, I mentioned how a tensor is a matrix containing matrixes. The first dimension of a tensor is how many matrixes the tensor should have in it. The next two dimensions are the dimensions you want each matrix to have. For example,

    1234
    5678
    9101112
    13141516

    would be a 4×4 matrix. If you wanted four four-by-four matrixes, you would need to make the first dimension (the number of matrixes to be in the tensor, which, as I said, is a matrix full of matrixes) four. Then, you would want 4×4 matrixes, so you would input the next two dimensions as 4 and 4 for a 4×4 tensor.

    Tips

    • If you do not input your first dimension (the number of matrixes in the tensor) into a tensor, the number defaults to 1.
    • Tensors are useful for storing mass amounts of data.
    • One of the easiest ways to make a tensor with custom values would be to have a loop running into every scalar in the tensor, thus making every scalar something you choose.
    • Tensors, when stored, are stored unevaluated. This means that your actual data, typically the data you would be storing in a tensor would be numbers, is not actually stored raw, but rather compressed, which makes tensor storage much easier for the machine’s memory, since the data is significantly less complicated. This is what makes tensors so popular for the storage of mass data. If you want to see the actual, uncompressed data of a tensor, you must evaluate it. You can do this with a simple function in both PyTorch and TensorFlow.