IMAGE CLASSIFIER on CIFER10 dataset

6 min readMar 27, 2021

CIFER-10 dataset exploration & experiments

Problem Statement — To build a model to classify an image. Given dataset is CIFER-10 dataset.

Outcome — Given an image, out model will predict the class of that image.

so let’s start.

What is CIFER-10 ?

CIFER-10 is a dataset of 32*32 size colored images by CIFER, which consists 10 classes (‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’).

Let’s see, What are the Steps to Train a model ?—

We will do the following steps in order:

Load and normalizing the CIFAR10 training and test datasets using torchvision

2. Define a Convolutional Neural Network

3. Define a loss function

4. Train the network on the training data

5. Test the network on the test data

Let’s do the steps one by one

Loading and Normalizing CIFAR10

Pytorch gives us CIFER-10 dataset, so lets load our data & transform it using below code.transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)print("Train Dataset : ", len(trainloader))testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)print("Test Dataset : ", len(testloader))classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Here, we can see that our files has been downloaded. There are total 12500 data in training set whereas 2500 data in test set.

Let’s visualize some of the training images.

each image shape is 32 * 32 with 3 channel.

now let define a convolutional neural network.

2. Define Convolutional Neural Network

Before that lets understand what is CNN?

A CNN is a type of neural network, empowered with some specific hidden layers, including the Convolutional layer, the Pooling layer, and the Fully-Connected layer. CNN is mainly used in image processing applications.

since our image is of shape 32*32 the structure will look something like this

A convolutional operation is performed using kernel, maxpool, activation function, fully connected layer.

The kernel/filter slides over the input signal as shown below. You can see the filter (the green square) is sliding over our input (the blue square) and the sum of the convolution goes into the feature map (the red square).read it here.

Maxpool — Max pooling is a sample-based discretization process. The objective is to down-sample an input representation (image, hidden-layer output matrix, etc.), reducing its dimensionality

Activation Functions — the activation function of a node defines the output of that node given an input or set of inputs. perform a transformation on the input received, in order to keep values within a manageable range. there are many activation functions. we will use ReLU.

Now Let’s code our network

# Define a Neural Network with 2 convolution layer for 3 channel imagesclass Net2CL(nn.Module):def __init__(self):super(Net2CL, self).__init__()# conv layersself.conv1 = nn.Conv2d(3, 6, 5)self.conv2 = nn.Conv2d(6, 16, 5)# pooling layersself.pool = nn.MaxPool2d(2, 2)# fully connected layersself.fc1 = nn.Linear(16 * 5 * 5, 120)self.fc2 = nn.Linear(120, 84)self.fc3 = nn.Linear(84, 10)def forward(self, x):x = self.pool(F.relu(self.conv1(x)))x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)x = F.relu(self.fc1(x))x = F.relu(self.fc2(x))x = self.fc3(x)return xnet2cl = Net2CL()net2cl = net2cl.to(device)

Our network shape looks like this

likewise we have defined for 3 & 4 conv layer, see structure below. you can see here.

3. Define a Loss function and optimizer

Loss Function — It’s a method of evaluating how well specific algorithm models the given data. If predictions deviates too much from actual results, loss function would cough up a very large number.we will use Cross-Entropy Loss, which is most common for classification task.

Cross-Entropy Loss

It measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0.

nn.CrossEntropyLoss()

Optimizer — The optimizer takes the parameters we want to update, the learning rate we want to use (and possibly many other parameters as well, and performs the updates through its step() method.

There are many optimisers, we will use SGD or Adam

optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9)optim.Adam(net.parameters(), lr=learning_rate)

Now its time to train our Network.

after training our network on 2 layer model with SGD, the train loss looks like this.

Here we can see our loss reduced a lot with increasing epochs.

like this we can train our network on 3 layer & 4 layer. you can see the code on git.

Epoch vs Loss with 3 layers is as —

Epoch vs loss with 4 layer is as —

training on 2 layer gives us 66% accuracy whereas we have achieved a total of 70.40% accuracy with 3 layer model on test data.

Here is a comparison of model performance.

Here is a comparison of time elapsed of different layer neurons.

we can see Network with 2 layer takes around 2500000 ms ,network with 3 conv layer takes around 820000 ms and network with 4 layer takes around 7100000 ms