Introduction to image classification using CNN | EduGrad

In this blog, we are going to perform and understand image classification using CNN (convolutional neural networks) in python.

Pikachu or Iron Man?

Our goal will be to perform image classification and hence tell which class the input image belongs to. We will do this by training an artificial neural network on about 50 images of Iron Man and Pikachu and make the NN(Neural Network) learn to predict which class the image belongs to, next time it sees an image having  Iron Man or Pikachu in it.

The CNN Image classification model we are building here can be trained on any type of class you want, this classification python between Iron Man and Pikachu is a simple example for understanding how convolutional neural networks work.

We’ll use Keras deep learning library in python to build our CNN (Convolutional Neural Network).

The Dataset

The image classification dataset consists of about 50+ images of Iron man and Pikachu each and the folder hierarchy is as shown below.

You can download the data set here

Image Classification using CNN | EduGrad

Now, let’s try building a Convolutional Neural Network that involves image classification techniques, as follows:

Steps involved in building a convolutional neural network | EduGrad

Step – 1: Convolution

Convolution is the first layer that is used to extract features from an input image. We can say it is a mathematical operation that takes two inputs such as image matrix and a filter or kernel. It preserves the relationship between pixels by learning image features using small squares of input data.

First step in building a CNN model - Convolution | EduGrad

Consider a 5 x 5 image whose pixel values are 1,0 and filter matrix is 3 x 3:

Image matrix multiplies (kernel/filter matrix) | Image classification using CNN
Fig: Image matrix multiplies (kernel/filter matrix)

Then the convolution of 5 x 5 image matrix multiplies with 3 x 3 filter matrix, called as “Feature Map” as shown below:

Convolution of a matrix - Output matrix | EduGrad
Fig: 3 x 3 Output matrix

Different operations such as edge detection, blur and sharpen can be obtained from the convolution of an image by applying different filters, as shown below:

Operations on convolution of Image matrix | Image classification using CNN

Step – 2: Pooling

Pooling layers are used to reduce the number of parameters when the images are too large. Spatial pooling also known as subsampling or downsampling reduces the dimensionality of each map by preserving the important information. It can be of different types:

  • Max Pooling
  • Average Pooling
  • Sum Pooling

Pooling involves selecting a pooling operation, much like a filter to be applied to feature maps. The size of the pooling operation or filter is generally smaller than the size of the feature map; of about 2×2 pixels applied with a stride of 2 pixels.

Hence the pooling layer will always reduce the size of each feature map by a factor of 2 and hence the dimension is halved, reducing the number of pixels or values in each feature map to one-fourth the size. The pooling operation is specified, rather than learned. Two common functions used in the pooling operation are:

  • Average Pooling: The average value is calculated for each patch on the feature map.
  • Maximum Pooling (or Max Pooling): The maximum value is calculated for each patch of the feature map.

Image Classification using CNN - Pooling layers to reduce the image parameters

Step – 3: Flattening

After the previous two steps, we’re supposed to have a pooled feature map by now. So, we are literally going to flatten our pooled feature map into a column like in the image below.

Steps in building a CNN - Flattening

The reason for doing this is the fact that we need to insert this data into an artificial neural network later on.

Flattening concept explained - steps involved in building a CNN

As you see in the image above, we have multiple pooled feature maps from the previous step. After the flattening step, we end up with a long vector of input data that is passed through the artificial neural network to have further processing

For a quick revision, here is what we have after we’re done with each of the steps that we have covered up until now:

  • Input image (starting point)
  • Convolutional layer (convolution operation)
  • Pooling layer (pooling)
  • Creating Input layer for the artificial neural network (flattening)

Step – 4: Full connection

The objective of a fully connected layer is to take the results of the convolution/pooling process and use them to classify the image into a label (in a simple image classification example).

The output of convolution/pooling is flattened into a single vector of values, each representing a probability that a certain feature belongs to a label. For example, if the image is of a Pikachu, features representing things like tail or yellow color should have high probabilities for the label “Pikachu”.

The image below illustrates how the input values flow into the first layer of neurons. They are multiplied by weights and pass through an activation function (typically ReLu), just like in image classification using deep learning. Then they are passed forward to the output layer, where every neuron represents a classification label.

The fully connected part of the CNN network performs the backpropagation process to determine the most accurate weights. Each neuron receives weights prioritizing the most appropriate label. Finally, the neurons cast their “vote” on each of the labels, and the label that gets most votes becomes the classification decision.

Steps in building a CNN - Full connection | EduGrad

Implementation and Understanding CNN for Image Classification

Import all the required Keras image classification packages using which we are going to build our CNN, make sure that every package is installed properly in your machine. We will use image classification using Keras with a Tensorflow backend.

Importing the Keras libraries and packages

from keras.models import Sequential

For initializing our neural network model as a sequential network.

from keras.layers import Conv2D

Conv2D is to perform the convolution operation on 2-D images, which is the first step of a CNN, on the training images.

from keras.layers import MaxPooling2D

Importing Maxpooling function to perform pooling operation, since we need the maximum value pixel from the respective region of interest.

from keras.layers import Flatten

Importing Flatten to perform flattening step in order to get a single long continuous linear vector.

from keras.layers import Dense

Imported Dense from keras.layers, to perform the full connection of the neural network.

from keras.preprocessing.image import ImageDataGenerator

To generate batches of tensor image data with real-time data augmentation.

Now, we will create an object of the sequential class below:

# Sequential NN
classifier = Sequential()


# Convolution
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))

Adding a convolution layer by using the “Conv2D” function. The Conv2D function takes 4 arguments:

  1. No of filters: 32
  2. The shape of each filter: 3 x 3
  3. Input shape: (64 x 64) , Image type: ‘3’ (specifies RGB)
  4. Activation function: ‘relu’ stands for a rectifier function.


# Pooling
classifier.add(MaxPooling2D(pool_size = (2, 2)))

Adding a pooling layer. Pool size =  2×2 matrix


# Flatten

Used flatten function to perform flattening

Fully Connected

# Fully Connected
classifier.add(Dense(units = 128, activation = 'relu'))

A dense function used to add a fully connected layer,

Units’: No. of nodes present in a hidden layer,

Activation function: rectifier function.

Output Layer

# Output Layer
classifier.add(Dense(units = 1, activation = 'sigmoid'))

The output layer contains only one node since it is binary classification and will give a binary output of either Iron Man or Pikachu.

So, here the activation function will be Sigmoid which gives binary output ‘0’ or ‘1’.


# Model Compilation
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', 
metrics = ['accuracy'])

Compilation using following parameters:

  • Optimizer = adam
  • Loss parameter = binary cross-entropy
  • Metrics = accuracy

Image Pre-processing

Before fitting images to the neural network, we need to synthesize the training data i.e. images. We will use keras.preprocessing library for this task to prepare the images in the training set as well as the test set.

Here the name of the directory is taken as the label for all the images present in the folder i.e. images inside the ‘Iron Man’ named folder will be considered as Iron Man by Keras.

train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('Dataset/train',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
test_set = test_datagen.flow_from_directory('Dataset/test/',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')

These lines are just to preprocess images and prepare them for model training.

For each parameter meaning, you can study the documentation of Keras ImageDataGenerator

Model Fitting (CNN model on images)

steps_per_epoch = 109,
epochs = 6,
validation_data = test_set, validation_steps = 36)
  • steps_per_epoch’ is the no. of training images.
  • In training a neural network a single epoch is a single step; or we can say that when a neural network is trained on every training samples in a single pass, we say that one epoch is finished. So training of the model should consist of more than one epochs.
  • In this case, we have defined 6 epochs since the dataset is small…

Making Predictions on Test Image:

import numpy as np
from IPython.display import Image,display

# To display the image in jupyter notebook

from keras.preprocessing import image

Function to predict

 def who(img_file):

# takes an image file name with extension

img_name = img_file

# Image Pre-processing

test_image = image.load_img(img_name, target_size = (64, 64))

# displaying image

test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)

# classifying image

result = classifier.predict(test_image)

# Giving Labels

if result[0][0] == 1:
    prediction = 'Pikachu'
    prediction = 'Iron Man '

Prediction on every test image in the test dataset

# Getting all image file names from the test folder

import os
path = 'C:/Users/Edugrad/Desktop/Shubham Mishra/BLOG/Image Classification/Dataset
files = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
   for file in f:
     if '.jpg' in file:
       files.append(os.path.join(r, file))

# Predicting and classifying each test image

for f in files:
Explore our popular Data science courses – 

Learn Data Analytics using Python | EduGrad Learn web scraping using Python | EduGrad Learn Python for Data science | EduGrad

Top Machine Learning Industry Dataset Projects – 

Learn to build Recommendation system in Python | Machine Learning Projects | Data science Projects | EduGrad Learn Predictive Regression models in Machine Learning | Machine Learning Projects | EduGrad Build classification model in Python | Data science projects | EduGrad

Related Tutorials with dedicated Jupyter Lab –

Data Visualization tools and start creating your own Dashboards | EduGrad Learn Regression Analysis in 2 min | EduGrad Text Identification from Images using Pytesseract and OpenCV


Please enter your comment!
Please enter your name here