Handwritten Digit Prediction using Convolutional Neural Networks in TensorFlow with Keras and Live Example using TensorFlow.js

Whenever we start learning a new programming language we always start with Hello World Program. Likewise, most AI/ML developers say “Just like programming has Hello World, machine learning has MNIST”.

Like everyone, I wanted to start from there. In fact, I wanted to write my first article/story related ML on MNIST but that didn’t sound exciting because the internet has loads of MNIST articles. I want my article/story different from others so I thought with code why can’t I share a live example also?

Let’s get started. I hope you have TensorFlow, Keras in your system if not please read my previous article. It has instructions about how to install

First, Lets import all necessary libraries required.

import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.optimizers import Adam
from keras.utils import np_utils

Next, let’s load the MNIST data provided by Keras

# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

The datasets(training & test) are 3D arrays. Training dataset shape is (60000, 28, 28) & Testing dataset shape is (10000, 28, 28).

The input shape that CNN expects is a 4D array (batch, height, width, channels). Channels signify whether the image is grayscale or colored. In our case, we are using grayscale images so we give 1 for channels if these are colored images we give 3(RGB). Below code for reshaping our inputs.

# Reshaping to format which CNN expects (batch, height, width, channels)
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], X_train.shape[2], 1).astype('float32')
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], 1).astype('float32')

It’s always good to normalize data. Our Datasets will have data in each pixel in between 0–255 so now we scale it to 0–1 using below code.

# normalize inputs from 0-255 to 0-1

Our output ranges between 0–9. So, its a multi-class classification problem. All values(output) are equal to us so it’s better to use one-hot encoding. One-hot encoding transforms integer to a binary matrix where the array contains only one ‘1’ and the rest elements are ‘0’.

For example, we are expecting output as 8 means value of output variable 8 so according to one-hot coding its [0,0,0,0,0,0,0,0,1,0]

# one hot encode
number_of_classes = 10
y_train = np_utils.to_categorical(y_train, number_of_classes)
y_test = np_utils.to_categorical(y_test, number_of_classes)

Now let’s build model

# create model
model = Sequential()
model.add(Conv2D(32, (5, 5), input_shape=(X_train.shape[1], X_train.shape[2], 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(128, activation='relu'))
model.add(Dense(number_of_classes, activation='softmax'))

Let’s understand above code step by step.

  1. The first hidden layer is a convolutional layer called a Convolution2D. The layer has 32 filters/output channels, which with the size of 5×5 and an activation function. This is the input layer, expecting images with the structure outlined above (height, width, channels).
  2. The Second layer is the MaxPooling layer. MaxPooling layer is used to down-sample the input to enable the model to make assumptions about the features so as to reduce over-fitting. It also reduces the number of parameters to learn, reducing the training time.
  3. One more hidden layer with 32 filters/output channels with the size of 3×3 and an activation function.
  4. One more MaxPooling layer.
  5. The next layer is a regularization layer using dropout called Dropout. It is configured to randomly exclude 20% of neurons in the layer in order to reduce overfitting.
  6. Next layer converts the 2D matrix data to a vector called Flatten. It allows the output to be processed by standard fully connected layers.
  7. Next layer is a fully connected layer with 128 neurons.
  8. Next(last) layer is output layer with 10 neurons(number of output classes) and it uses softmax activation function. Each neuron will give the probability of that class. It’s a multi-class classification that’s why softmax activation function if it was a binary classification we use sigmoid activation function.

Let’s compile the model. I used categorical_crossentropy as a loss function because its a multi-class classification problem. I used Adam as Optimizer to make sure our weights optimized properly. I used accuracy as metrics to improve the performance of our neural network.

# Compile model
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])

It’s time for our model training. The model is going to fit over 10 epochs and updates after every 200 images training. The test data is used as the validation dataset, allowing you to see the skill of the model as it trains.

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200)

I want to test my trained model with my own images so I want to store my model on my local hard disk.

# Save the model

The test dataset is used to evaluate the model and after evaluation Test loss & Test Accuracy metrics will be printed.

# Final evaluation of the model
metrics = model.evaluate(X_test, y_test, verbose=0)
print("Metrics(Test loss & Test Accuracy): ")

I got around 99.19% accuracy. You will find this example code with name mnistCNN.py at my GitHub repository.

After completing this I didn’t get satisfaction because it ran on the data provided by Keras. I want to verify my trained model on my own data. So I created a couple of images by myself & stored the images in my data folder and then checked with my model. Results looked decent. Code for this

# Importing the Keras libraries and packages
from keras.models import load_model
model = load_model('models/mnistCNN.h5')
from PIL import Image
import numpy as np
for index in range(10):
    img = Image.open('data/' + str(index) + '.png').convert("L")
    img = img.resize((28,28))
    im2arr = np.array(img)
    im2arr = im2arr.reshape(1,28,28,1)
    # Predicting the Test set results
    y_pred = model.predict(im2arr)

You will find above code, images & model file at at my GitHub repository. To run above code you need Pillow Package. You need to run below command to get the package.

pip3 install pillow

But still, I am not satisfied so I thought let’s do something more. We all know Google introduced TensorFlow.js. I read that we can use our existing
model also. So I thought why not build a small page for this example. From here journey became more excited.

First, we need canvas where the user can draw a number. For this, I wrote an HTML with the help of this article.

Now we want our model to be used at browser level for that we need to convert into the format by which TensorFlow.js can consume. For this task, this article helped me. To convert Keras model to TensorFlow js consumable model we need tensorflowjs_converter. For this we need to install tensorflowjs package.

pip3 install tensorflowjs

I used below command to convert the format

tensorflowjs_converter --input_format keras models/mnistCNN.h5 models/

Now a model file & a couple of supporting files for the model will be created at models folder. With these(model.json, group1-shard1of1, group2-shard1of1, group3-shard1of1, group4-shard1of1) names. These are going to help us to use our Trained DL(Deep Learning) model.

Now I am going to reveal our secret ingredient for this story

I am going to explain 3 important things here rest all are fairly straightforward. It all starts with TensorFlow.js script include. Need to include TensorFlow.js for that add below line to your HTML file.

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@0.11.2"> </script>

Next our init function. 2 lines are important in this init function.

1. I used the async function because I want to make sure model is loaded before using the example. that’s why await used at the time of loading model.

2. Load the model. Use below code for this.

model = await tf.loadModel('model.json');

Next most important one, our Predict function.

function predict() {
   const imageData = ctx.getImageData(0, 0, 140, 140);
   //convert to tensor
   var tfImg = tf.fromPixels(imageData, 1);
   //Resize the image
   var smalImg = tf.image.resizeBilinear(tfImg, [28, 28]);
   smalImg = tf.cast(smalImg, 'float32');
   var tensor = smalImg.expandDims(0);
   tensor = tensor.div(tf.scalar(255));
   const prediction = model.predict(tensor);
   const predictedValues = prediction.dataSync();
   var isThereAnyPrediction = false;
   for (index = 0; index < predictedValues.length; index++) {
      if (predictedValues[index] > 0.5) {
         isThereAnyPrediction = true;
         document.getElementById('rightside').innerHTML = '<br/>Predicted Number: ' + index;
   if (!isThereAnyPrediction) {
      document.getElementById('rightside').innerHTML = '<br>Unable to Predict';

Let’s understand above code step by step.

  1. First, we are extracting the grayscale image from the canvas.
  2. Then converting that image to tensor(Array)
  3. We want 28*28 array(image) so we are resizing the array
  4. We want data to be in the float32 format so we are type casting data to the float32 format
  5. We need [1, 28, 28, 1] shape for Model because it expects (batch, height, width, channels)
  6. We need to normalize the data so we divided data with 255.
  7. Then trying to predict the number

You will find this code at my another GitHub repository.

You can see its live example here. It’s not perfect but performs decently.

Peace. Happy Coding.

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *