In this blog, we are going to perform and understand image classification using CNN (convolutional neural networks) in python.
Pikachu or Iron Man?
Our goal will be to perform image classification and hence tell which class the input image belongs to. We will do this by training an artificial neural network on about 50 images of Iron Man and Pikachu and make the NN(Neural Network) learn to predict which class the image belongs to, next time it sees an image having Iron Man or Pikachu in it.
The CNN Image classification model we are building here can be trained on any type of class you want, this classification python between Iron Man and Pikachu is a simple example for understanding how convolutional neural networks work.
We’ll use Keras deep learning library in python to build our CNN (Convolutional Neural Network).
The image classification dataset consists of about 50+ images of Iron man and Pikachu each and the folder hierarchy is as shown below.
You can download the data set here
Now, let’s try building a Convolutional Neural Network that involves image classification techniques, as follows:
Step – 1: Convolution
Convolution is the first layer that is used to extract features from an input image. We can say it is a mathematical operation that takes two inputs such as image matrix and a filter or kernel. It preserves the relationship between pixels by learning image features using small squares of input data.
Consider a 5 x 5 image whose pixel values are 1,0 and filter matrix is 3 x 3:
Then the convolution of 5 x 5 image matrix multiplies with 3 x 3 filter matrix, called as “Feature Map” as shown below:
Different operations such as edge detection, blur and sharpen can be obtained from the convolution of an image by applying different filters, as shown below:
Step – 2: Pooling
Pooling layers are used to reduce the number of parameters when the images are too large. Spatial pooling also known as subsampling or downsampling reduces the dimensionality of each map by preserving the important information. It can be of different types:
- Max Pooling
- Average Pooling
- Sum Pooling
Pooling involves selecting a pooling operation, much like a filter to be applied to feature maps. The size of the pooling operation or filter is generally smaller than the size of the feature map; of about 2×2 pixels applied with a stride of 2 pixels.
Hence the pooling layer will always reduce the size of each feature map by a factor of 2 and hence the dimension is halved, reducing the number of pixels or values in each feature map to one-fourth the size. The pooling operation is specified, rather than learned. Two common functions used in the pooling operation are:
- Average Pooling: The average value is calculated for each patch on the feature map.
- Maximum Pooling (or Max Pooling): The maximum value is calculated for each patch of the feature map.
Step – 3: Flattening
After the previous two steps, we’re supposed to have a pooled feature map by now. So, we are literally going to flatten our pooled feature map into a column like in the image below.
The reason for doing this is the fact that we need to insert this data into an artificial neural network later on.
As you see in the image above, we have multiple pooled feature maps from the previous step. After the flattening step, we end up with a long vector of input data that is passed through the artificial neural network to have further processing
For a quick revision, here is what we have after we’re done with each of the steps that we have covered up until now:
- Input image (starting point)
- Convolutional layer (convolution operation)
- Pooling layer (pooling)
- Creating Input layer for the artificial neural network (flattening)
Step – 4: Full connection
The objective of a fully connected layer is to take the results of the convolution/pooling process and use them to classify the image into a label (in a simple image classification example).
The output of convolution/pooling is flattened into a single vector of values, each representing a probability that a certain feature belongs to a label. For example, if the image is of a Pikachu, features representing things like tail or yellow color should have high probabilities for the label “Pikachu”.
The image below illustrates how the input values flow into the first layer of neurons. They are multiplied by weights and pass through an activation function (typically ReLu), just like in image classification using deep learning. Then they are passed forward to the output layer, where every neuron represents a classification label.
The fully connected part of the CNN network performs the backpropagation process to determine the most accurate weights. Each neuron receives weights prioritizing the most appropriate label. Finally, the neurons cast their “vote” on each of the labels, and the label that gets most votes becomes the classification decision.
Implementation and Understanding CNN for Image Classification
Import all the required Keras image classification packages using which we are going to build our CNN, make sure that every package is installed properly in your machine. We will use image classification using Keras with a Tensorflow backend.
Importing the Keras libraries and packages
from keras.models import Sequential
For initializing our neural network model as a sequential network.
from keras.layers import Conv2D
Conv2D is to perform the convolution operation on 2-D images, which is the first step of a CNN, on the training images.
from keras.layers import MaxPooling2D
Importing Maxpooling function to perform pooling operation, since we need the maximum value pixel from the respective region of interest.
from keras.layers import Flatten
Importing Flatten to perform flattening step in order to get a single long continuous linear vector.
from keras.layers import Dense
Imported Dense from keras.layers, to perform the full connection of the neural network.
from keras.preprocessing.image import ImageDataGenerator
To generate batches of tensor image data with real-time data augmentation.
Now, we will create an object of the sequential class below:
# Sequential NN classifier = Sequential()
# Convolution classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
Adding a convolution layer by using the “Conv2D” function. The Conv2D function takes 4 arguments:
- No of filters: 32
- The shape of each filter: 3 x 3
- Input shape: (64 x 64) , Image type: ‘3’ (specifies RGB)
- Activation function: ‘relu’ stands for a rectifier function.
# Pooling classifier.add(MaxPooling2D(pool_size = (2, 2)))
Adding a pooling layer. Pool size = 2×2 matrix
# Flatten classifier.add(Flatten())
Used flatten function to perform flattening
# Fully Connected classifier.add(Dense(units = 128, activation = 'relu'))
A dense function used to add a fully connected layer,
‘Units’: No. of nodes present in a hidden layer,
Activation function: rectifier function.
# Output Layer classifier.add(Dense(units = 1, activation = 'sigmoid'))
The output layer contains only one node since it is binary classification and will give a binary output of either Iron Man or Pikachu.
So, here the activation function will be Sigmoid which gives binary output ‘0’ or ‘1’.
# Model Compilation classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
Compilation using following parameters:
- Optimizer = adam
- Loss parameter = binary cross-entropy
- Metrics = accuracy
Before fitting images to the neural network, we need to synthesize the training data i.e. images. We will use keras.preprocessing library for this task to prepare the images in the training set as well as the test set.
Here the name of the directory is taken as the label for all the images present in the folder i.e. images inside the ‘Iron Man’ named folder will be considered as Iron Man by Keras.
train_datagen = ImageDataGenerator(rescale = 1./255, shear_range = 0.2, zoom_range = 0.2, horizontal_flip = True) test_datagen = ImageDataGenerator(rescale = 1./255) training_set = train_datagen.flow_from_directory('Dataset/train', target_size = (64, 64), batch_size = 32, class_mode = 'binary') test_set = test_datagen.flow_from_directory('Dataset/test/', target_size = (64, 64), batch_size = 32, class_mode = 'binary')
These lines are just to preprocess images and prepare them for model training.
For each parameter meaning, you can study the documentation of Keras ImageDataGenerator
Model Fitting (CNN model on images)
classifier.fit_generator(training_set, steps_per_epoch = 109, epochs = 6, validation_data = test_set, validation_steps = 36)
- ‘steps_per_epoch’ is the no. of training images.
- In training a neural network a single epoch is a single step; or we can say that when a neural network is trained on every training samples in a single pass, we say that one epoch is finished. So training of the model should consist of more than one epochs.
- In this case, we have defined 6 epochs since the dataset is small…
Making Predictions on Test Image:
import numpy as np from IPython.display import Image,display
# To display the image in jupyter notebook
from keras.preprocessing import image
Function to predict
# takes an image file name with extension
img_name = img_file
# Image Pre-processing
test_image = image.load_img(img_name, target_size = (64, 64))
# displaying image
display(Image(filename=img_name)) test_image = image.img_to_array(test_image) test_image = np.expand_dims(test_image, axis = 0)
# classifying image
result = classifier.predict(test_image) training_set.class_indices
# Giving Labels
if result == 1: prediction = 'Pikachu' else: prediction = 'Iron Man ' print(prediction)
Prediction on every test image in the test dataset
# Getting all image file names from the test folder
import os path = 'C:/Users/Edugrad/Desktop/Shubham Mishra/BLOG/Image Classification/Dataset /test/test' files =  # r=root, d=directories, f = files for r, d, f in os.walk(path): for file in f: if '.jpg' in file: files.append(os.path.join(r, file))
# Predicting and classifying each test image
for f in files: who(f) print('\n')