5. Training a network
Contents
5. Training a network¶
Now that we know how to create a network, pass an input and an output, we only need to learn how to proceed for training and using it. Let’s remember the different steps needed:
from IPython.display import Image
# set path containing data folder or use default for Colab (/gdrive/My Drive)
local_folder = "../"
import urllib.request
urllib.request.urlretrieve('https://raw.githubusercontent.com/guiwitz/DLImaging/master/utils/check_colab.py', 'check_colab.py')
from check_colab import set_datapath
colab, datapath = set_datapath(local_folder)
Image(url='https://github.com/guiwitz/DLImaging/raw/master/illustrations/ML_principle.jpg',width=700)
Mounted at /gdrive

First we will pass training examples forward through the network
We measure an error between prediction and true label, the loss
We calculate the gradient of the loss respective to each parameter in the model. This is done by backpropagation
We adjust the parameters using the calculated gradient and an optimizer (e.g. SGD)
Additionally we will also see in this notebooks additional aspects such as training epochs and validation. The goal here is to once see the whole pipeline in detail before we start using tools that reduce some of the boiler-plate code necessary here.
Mini-batches¶
Before we create our network and define a loss, let’s remember how training samples are passed through the network. In principle we want to do each optimization step for the entire dataset not just a single image as training would have a difficult time to converge. However this is usually not possible and and instead what is generally done is to use mini-batches, i.e. the network is iteratively trained on subsets of traininig examples. So now instead of using the gradients produced by a single image, one can use for example the average gradients over the mini-batch:
Image(url='https://github.com/guiwitz/DLImaging/raw/master/illustrations/batch_processing.jpg',width=700)

PyTorch is in fact designed to handle batches by default. We can see that if we look at the documentation of modules such as nn.Linear which says that inputs should have the shape N x ... where N stands for batch size and ... for other dimensions such as channels, samples etc. This applies in fact to all modules, including those calculating losses. We can therefore feed examples with dimensions N x ... and PyTorch handles batch calculations for us.
Creating the network¶
What does this mean for out network? We only have to make one slight modification. We used x.view(-1) previously to flatten 32x32 images into vectors of 1024 elements. If we now feed a batch of size Nx32x32, this would generate a long vector of size Nx1024. So we need to adjust the view() command and specify the size of the first dimension. In such a way only the image dimensions are flattened: x.view(batchsize, -1). Alternatively we can use torch.flatten(start_dim = 1) specifying from which dimension we want to start flattening:
import torch
from torch import nn
from torch.functional import F
class Mynetwork(nn.Module):
def __init__(self, input_size, num_categories):
super(Mynetwork, self).__init__()
# define e.g. layers here e.g.
self.layer1 = nn.Linear(input_size, 100)
self.layer2 = nn.Linear(100, 100)
self.layer3 = nn.Linear(100, num_categories)
def forward(self, x):
# flatten the input
x = x.flatten(start_dim=1)
# define the sequence of operations in the network including e.g. activations
x = F.relu(self.layer1(x))
x = F.relu(self.layer2(x))
x = self.layer3(x)
return x
And now we instantiate it with:
model = Mynetwork(1024, 2)
Let’s check that inputs/outputs work as expected:
myinput = torch.randn((5,32,32))
myinput.size()
torch.Size([5, 32, 32])
myoutput = model(myinput)
myoutput.size()
torch.Size([5, 2])
If we want to pass a single element, e.g. in the inference phase, then we still have to reshape it so that it has dimensions N x .... The first dimension will just have a size of 1. The simples to do that is to use unsqueeze():
myimage = torch.randn((32,32))
myimage.size()
torch.Size([32, 32])
myimage = myimage.unsqueeze(0)
myimage.size()
torch.Size([1, 32, 32])
output = model(myimage)
output.size()
torch.Size([1, 2])
Defining a loss function and and backpropagating¶
In this example, we are going to classify images. Therefore we can use a standard loss function like cross-entropy which is also available in the torch.nn module:
criterion = nn.CrossEntropyLoss()
type(criterion)
torch.nn.modules.loss.CrossEntropyLoss
We see that the loss function is also a module i.e. it is differentiable and can just be integrated in the network. Also it sticks to the same “batch-logic” as the other layers. Therefore it expects inputs whose dimensions start with N for bactches. What we need here is the output of the network of size N x C where C is the number of categories (2 in our example) and a list of target labels (“true” labels) which have of course to be turned into a tensor.
We make up some data here:
mysample = torch.randn(3, 32*32)
mylabels = torch.tensor([0,1,1])
We pass them through the network:
output = model(mysample)
And compare output to target with the cross-entropy module:
loss = criterion(output, mylabels)
Note that the CrossEntropyLoss module automatically applies soft-max to the output and then calculates the loss. So we don’t need to have a soft-max layer at the end of our network.
Now that we have done the forward pass, we can calculate the gradients of the loss by backpropagation. This is simply done by calling the backward method:
loss.backward()
Optimizer¶
Now that we have an estimate of the loss and gradients, we can optimize all our paramters by using some optimization algorithm. Several are available in torch.optim. We use here the Adam optimizer, one of the “safest” choices. As arguments we need to pass a list of parameters that need to be optimized. We can do that by recovering them from our model:
list(model.parameters())
[Parameter containing:
tensor([[-2.4230e-02, -2.9601e-02, 1.2543e-02, ..., -9.2132e-03,
3.0666e-02, -1.1810e-02],
[-2.9703e-02, 9.6840e-03, -2.4394e-03, ..., 1.9239e-02,
-1.5664e-02, -7.6677e-03],
[-2.2457e-02, -1.2396e-02, 2.6122e-02, ..., 2.2048e-02,
2.7859e-02, -2.9419e-05],
...,
[-1.1190e-02, 6.2377e-03, 1.5656e-02, ..., -1.1900e-02,
-2.4993e-02, -1.2568e-02],
[-3.0729e-02, -2.4979e-02, 2.9544e-02, ..., -2.5657e-03,
-1.6533e-02, 1.2870e-02],
[-6.8288e-03, -2.5051e-02, -1.1303e-02, ..., 1.5859e-02,
-2.0387e-04, 2.9414e-02]], requires_grad=True),
Parameter containing:
tensor([-1.7122e-03, 1.5918e-02, -4.4739e-03, 1.5764e-02, 2.2635e-02,
3.4436e-03, -5.1088e-03, -2.8978e-03, -2.7906e-03, 2.6192e-03,
1.6524e-02, -2.0283e-02, 9.2856e-03, -3.0009e-02, -3.3266e-03,
1.9582e-03, 2.5571e-02, 1.6756e-02, -1.8997e-02, 9.6735e-03,
8.1861e-03, -7.4155e-03, 2.4495e-02, 2.8989e-02, 2.7048e-02,
-2.4382e-02, 1.5752e-02, -1.3777e-02, 1.1703e-02, -1.8277e-02,
-1.1853e-02, -6.8097e-03, -1.8303e-02, 1.8391e-02, -2.1328e-02,
1.9287e-02, 1.6026e-02, 1.5213e-02, 3.0921e-02, -1.0292e-02,
-2.3458e-02, -2.9849e-02, 2.5212e-02, 8.0392e-03, -2.0435e-02,
-1.9550e-02, -6.2864e-03, 2.3689e-02, -5.1168e-04, 1.8788e-02,
-6.7465e-05, 1.4272e-02, 2.8579e-02, -5.8571e-03, 6.8409e-04,
-2.6237e-02, 7.1548e-03, 1.8737e-02, 7.8104e-03, -2.6473e-03,
2.2297e-02, -2.2014e-02, -2.2084e-02, -2.9328e-02, 2.5234e-02,
2.7062e-02, -2.7877e-02, -2.0630e-02, 1.9244e-02, 1.6541e-02,
-1.7646e-02, 3.2225e-04, -2.5996e-02, -1.8584e-02, -1.1981e-02,
2.0410e-02, -2.9152e-02, -5.2047e-03, -2.8990e-02, 1.1037e-02,
-2.2448e-02, 8.7456e-03, 2.5595e-02, -2.9518e-02, 2.6131e-02,
-2.8135e-02, 1.0149e-02, -2.8058e-02, 1.9725e-02, -9.0041e-03,
6.5263e-03, -2.5279e-03, 7.8205e-04, 1.0576e-02, -2.5855e-02,
1.2865e-02, -1.0878e-02, 2.0166e-02, 4.9063e-03, -1.9915e-02],
requires_grad=True),
Parameter containing:
tensor([[ 0.0258, 0.0249, 0.0415, ..., 0.0734, 0.0886, 0.0256],
[ 0.0224, -0.0791, 0.0697, ..., 0.0799, -0.0583, 0.0356],
[-0.0444, 0.0861, 0.0043, ..., -0.0058, 0.0433, -0.0546],
...,
[-0.0779, -0.0799, -0.0888, ..., 0.0704, 0.0691, -0.0260],
[-0.0285, 0.0002, -0.0179, ..., -0.0485, -0.0550, -0.0864],
[ 0.0079, 0.0261, 0.0159, ..., 0.0066, 0.0430, -0.0903]],
requires_grad=True),
Parameter containing:
tensor([-0.0686, 0.0533, -0.0512, -0.0453, 0.0568, -0.0812, 0.0997, 0.0949,
0.0518, -0.0973, -0.0574, -0.0171, -0.0477, 0.0267, 0.0883, -0.0221,
-0.0081, -0.0195, 0.0312, -0.0848, 0.0833, -0.0780, 0.0221, -0.0620,
-0.0911, -0.0334, -0.0044, 0.0084, -0.0757, 0.0538, -0.0469, -0.0871,
0.0966, -0.0077, 0.0242, 0.0260, -0.0925, -0.0949, -0.0705, -0.0128,
-0.0791, -0.0225, 0.0399, -0.0004, -0.0040, 0.0331, 0.0448, -0.0077,
-0.0251, 0.0461, 0.0023, 0.0387, 0.0190, -0.0653, 0.0311, -0.0474,
0.0583, 0.0405, -0.0181, -0.0954, -0.0394, -0.0149, -0.0523, -0.0658,
0.0875, -0.0487, 0.0789, -0.0943, 0.0457, 0.0483, 0.0068, 0.0442,
0.0295, 0.0725, 0.0077, -0.0277, 0.0729, 0.0300, 0.0924, -0.0251,
-0.0407, -0.0457, 0.0920, 0.0220, -0.0311, -0.0713, -0.0917, 0.0528,
-0.0841, 0.0609, -0.0287, 0.0643, 0.0666, 0.0150, 0.0117, -0.0697,
0.0915, -0.0566, 0.0071, 0.0439], requires_grad=True),
Parameter containing:
tensor([[-0.0488, -0.0480, 0.0464, -0.0155, 0.0320, -0.0848, -0.0207, 0.0366,
-0.0237, -0.0256, 0.0927, -0.0487, 0.0759, 0.0051, 0.0847, -0.0673,
-0.0775, 0.0725, -0.0205, -0.0213, -0.0860, 0.0876, 0.0353, -0.0108,
0.0384, 0.0863, -0.0443, -0.0396, 0.0311, -0.0921, 0.0176, 0.0961,
0.0570, 0.0872, 0.0955, -0.0916, -0.0326, 0.0586, -0.0265, -0.0152,
0.0089, 0.0670, -0.0845, -0.0651, 0.0627, -0.0194, 0.0362, -0.0978,
-0.0479, 0.0322, 0.0994, 0.0299, -0.0573, 0.0094, 0.0723, -0.0712,
-0.0491, -0.0767, -0.0789, 0.0449, -0.0833, -0.0427, 0.0324, -0.0257,
-0.0539, 0.0964, 0.0777, -0.0553, 0.0449, 0.0230, 0.0627, 0.0233,
0.0523, -0.0321, 0.0372, 0.0796, 0.0989, 0.0227, 0.0504, -0.0470,
0.0642, -0.0675, -0.0066, 0.0623, 0.0872, -0.0835, 0.0643, -0.0060,
0.0400, 0.0126, 0.0281, -0.0782, 0.0891, -0.0004, 0.0695, -0.0990,
0.0498, -0.0406, 0.0366, 0.0385],
[ 0.0552, -0.0146, -0.0867, 0.0110, 0.0414, 0.0193, -0.0093, 0.0507,
0.0739, -0.0583, 0.0724, -0.0753, 0.0114, 0.0365, 0.0538, -0.0027,
0.0068, 0.0129, 0.0421, -0.0525, 0.0962, -0.0244, -0.0273, 0.0897,
0.0566, -0.0289, -0.0894, -0.0668, -0.0723, -0.0353, 0.0294, 0.0305,
-0.0016, 0.0211, 0.0146, 0.0306, -0.0577, -0.0625, -0.0444, 0.0509,
0.0365, -0.0197, 0.0022, 0.0369, -0.0161, -0.0527, -0.0642, 0.0794,
-0.0694, -0.0949, -0.0714, -0.0231, 0.0229, -0.0048, 0.0605, -0.0698,
0.0926, -0.0827, -0.0634, 0.0719, 0.0562, 0.0812, -0.0138, 0.0506,
0.0307, -0.0875, -0.0430, 0.0404, 0.0013, 0.0177, -0.0570, -0.0008,
0.0248, 0.0005, 0.0402, 0.0810, 0.0401, -0.0707, -0.0074, 0.0669,
0.0033, 0.0156, -0.0974, -0.0365, 0.0920, -0.0529, -0.0859, 0.0253,
0.0718, -0.0869, 0.0890, 0.0709, 0.0370, -0.0400, -0.0033, -0.0958,
-0.0685, 0.0994, 0.0436, -0.0269]], requires_grad=True),
Parameter containing:
tensor([-0.0131, -0.0262], requires_grad=True)]
import torch.optim as optim
optimizer = optim.Adam(model.parameters(), lr=0.001)
Now we have to acutally do one step of optimization using the step method:
optimizer.step()
Let’s check that some parameters have really changed:
list(model.parameters())
[Parameter containing:
tensor([[-0.0242, -0.0296, 0.0125, ..., -0.0092, 0.0307, -0.0118],
[-0.0287, 0.0087, -0.0014, ..., 0.0182, -0.0147, -0.0067],
[-0.0215, -0.0134, 0.0271, ..., 0.0210, 0.0289, -0.0010],
...,
[-0.0122, 0.0072, 0.0147, ..., -0.0109, -0.0260, -0.0136],
[-0.0297, -0.0240, 0.0305, ..., -0.0036, -0.0175, 0.0139],
[-0.0058, -0.0261, -0.0103, ..., 0.0149, -0.0012, 0.0304]],
requires_grad=True), Parameter containing:
tensor([-0.0017, 0.0149, -0.0055, 0.0168, 0.0236, 0.0044, -0.0061, -0.0019,
-0.0018, 0.0026, 0.0175, -0.0193, 0.0083, -0.0310, -0.0043, 0.0030,
0.0246, 0.0158, -0.0190, 0.0107, 0.0072, -0.0064, 0.0235, 0.0280,
0.0260, -0.0234, 0.0168, -0.0128, 0.0127, -0.0183, -0.0129, -0.0068,
-0.0193, 0.0174, -0.0203, 0.0183, 0.0150, 0.0142, 0.0299, -0.0093,
-0.0225, -0.0308, 0.0262, 0.0070, -0.0194, -0.0206, -0.0073, 0.0227,
0.0005, 0.0198, 0.0009, 0.0153, 0.0296, -0.0069, 0.0017, -0.0272,
0.0062, 0.0197, 0.0068, -0.0016, 0.0223, -0.0210, -0.0211, -0.0293,
0.0262, 0.0281, -0.0289, -0.0196, 0.0202, 0.0155, -0.0166, 0.0013,
-0.0260, -0.0186, -0.0130, 0.0194, -0.0302, -0.0062, -0.0300, 0.0100,
-0.0224, 0.0077, 0.0266, -0.0305, 0.0271, -0.0281, 0.0101, -0.0291,
0.0187, -0.0100, 0.0055, -0.0035, -0.0002, 0.0106, -0.0249, 0.0119,
-0.0099, 0.0212, 0.0059, -0.0209], requires_grad=True), Parameter containing:
tensor([[ 0.0258, 0.0249, 0.0415, ..., 0.0734, 0.0886, 0.0256],
[ 0.0224, -0.0781, 0.0707, ..., 0.0809, -0.0593, 0.0356],
[-0.0444, 0.0851, 0.0033, ..., -0.0068, 0.0443, -0.0546],
...,
[-0.0779, -0.0789, -0.0878, ..., 0.0714, 0.0701, -0.0250],
[-0.0285, 0.0012, -0.0169, ..., -0.0475, -0.0540, -0.0854],
[ 0.0079, 0.0251, 0.0149, ..., 0.0056, 0.0420, -0.0913]],
requires_grad=True), Parameter containing:
tensor([-0.0686, 0.0543, -0.0522, -0.0443, 0.0578, -0.0822, 0.1007, 0.0959,
0.0518, -0.0973, -0.0584, -0.0171, -0.0487, 0.0277, 0.0873, -0.0221,
-0.0071, -0.0205, 0.0322, -0.0848, 0.0823, -0.0770, 0.0211, -0.0610,
-0.0901, -0.0344, -0.0054, 0.0074, -0.0767, 0.0538, -0.0459, -0.0881,
0.0956, -0.0077, 0.0242, 0.0270, -0.0925, -0.0959, -0.0705, -0.0118,
-0.0791, -0.0225, 0.0409, 0.0006, -0.0050, 0.0321, 0.0448, -0.0077,
-0.0251, 0.0451, 0.0013, 0.0387, 0.0190, -0.0663, 0.0301, -0.0484,
0.0593, 0.0395, -0.0181, -0.0964, -0.0394, -0.0149, -0.0523, -0.0658,
0.0865, -0.0487, 0.0779, -0.0943, 0.0447, 0.0473, 0.0058, 0.0432,
0.0285, 0.0735, 0.0087, -0.0267, 0.0719, 0.0300, 0.0914, -0.0261,
-0.0407, -0.0447, 0.0910, 0.0210, -0.0301, -0.0713, -0.0927, 0.0528,
-0.0841, 0.0619, -0.0277, 0.0653, 0.0656, 0.0140, 0.0107, -0.0697,
0.0905, -0.0556, 0.0081, 0.0429], requires_grad=True), Parameter containing:
tensor([[-0.0488, -0.0490, 0.0454, -0.0165, 0.0310, -0.0838, -0.0217, 0.0356,
-0.0237, -0.0256, 0.0917, -0.0487, 0.0749, 0.0041, 0.0837, -0.0673,
-0.0785, 0.0735, -0.0215, -0.0213, -0.0850, 0.0886, 0.0343, -0.0118,
0.0374, 0.0873, -0.0453, -0.0406, 0.0301, -0.0921, 0.0166, 0.0951,
0.0580, 0.0872, 0.0955, -0.0926, -0.0326, 0.0576, -0.0265, -0.0162,
0.0089, 0.0670, -0.0855, -0.0661, 0.0617, -0.0204, 0.0362, -0.0978,
-0.0479, 0.0332, 0.0984, 0.0299, -0.0573, 0.0084, 0.0733, -0.0702,
-0.0501, -0.0777, -0.0789, 0.0459, -0.0833, -0.0427, 0.0324, -0.0257,
-0.0529, 0.0964, 0.0787, -0.0553, 0.0439, 0.0240, 0.0617, 0.0223,
0.0513, -0.0331, 0.0362, 0.0786, 0.0999, 0.0227, 0.0494, -0.0460,
0.0642, -0.0685, -0.0076, 0.0613, 0.0862, -0.0835, 0.0633, -0.0060,
0.0400, 0.0136, 0.0271, -0.0792, 0.0881, 0.0006, 0.0685, -0.0990,
0.0508, -0.0416, 0.0356, 0.0395],
[ 0.0552, -0.0136, -0.0857, 0.0120, 0.0424, 0.0183, -0.0083, 0.0517,
0.0739, -0.0583, 0.0734, -0.0753, 0.0124, 0.0375, 0.0548, -0.0027,
0.0078, 0.0119, 0.0431, -0.0525, 0.0952, -0.0254, -0.0263, 0.0907,
0.0576, -0.0299, -0.0884, -0.0658, -0.0713, -0.0353, 0.0304, 0.0315,
-0.0026, 0.0211, 0.0146, 0.0316, -0.0577, -0.0615, -0.0444, 0.0519,
0.0365, -0.0197, 0.0032, 0.0379, -0.0151, -0.0517, -0.0642, 0.0794,
-0.0694, -0.0959, -0.0704, -0.0231, 0.0229, -0.0038, 0.0595, -0.0708,
0.0936, -0.0817, -0.0634, 0.0709, 0.0562, 0.0812, -0.0138, 0.0506,
0.0297, -0.0875, -0.0440, 0.0404, 0.0023, 0.0167, -0.0560, 0.0002,
0.0258, 0.0015, 0.0412, 0.0820, 0.0391, -0.0707, -0.0064, 0.0659,
0.0033, 0.0166, -0.0964, -0.0355, 0.0930, -0.0529, -0.0849, 0.0253,
0.0718, -0.0879, 0.0900, 0.0719, 0.0380, -0.0410, -0.0023, -0.0958,
-0.0695, 0.1004, 0.0446, -0.0279]], requires_grad=True), Parameter containing:
tensor([-0.0141, -0.0252], requires_grad=True)]
Measuring accuracy¶
We use the cross-entropy as loss because it allows us to optimize our network. However what we are ultimately interested in is the accuracy of our model i.e. whether the correct label has been found or not. Such a binary answer is not useful for optimization but is what we want to monitor in the end. Let’s generate some random data and see how we can calculate this:
myimages = torch.randn((3,32,32))
labels = torch.randint(0,2,(3,))
labels
tensor([0, 1, 0])
output = model(myimages)
output
tensor([[ 0.2409, -0.0868],
[ 0.1836, -0.0178],
[ 0.1323, -0.0133]], grad_fn=<AddmmBackward0>)
The predicted category is the one with the highest probability (not normalized here but it doesn’t matter). We can therefore just look for the index of the maximum value along the horizontal dimension:
output.argmax(dim=1)
tensor([0, 0, 0])
Now we can compare prediction and true label:
labels == output.argmax(dim=1)
tensor([ True, False, True])
If we take the sum over this tensor, it tells us how many samples in the batch were correctly predicted and the average accuracy is:
(labels == output.argmax(dim=1)).sum()/3
tensor(0.6667)
Dataset¶
Now that we know that all steps work, we want to test our network. We will create a synthetic dataset for that using skimage.draw. We will just generate random images with either circles or triangles. As we have an “infinite” amount of data available, we just artificially set a size for our dataset.
Don’t forget that we need a training and a validation dataset. Every time we have trained the network with the whole training dataset we check prediction quality with the validation dataset to make sure e.g. we are not over-fitting.
from skimage.draw import random_shapes
import matplotlib.pyplot as plt
import numpy as np
The following function takes as input a keyword - triangle, circle, etc. - and outputs an image with that object as a PyTorch tensor:
def make_image(shape, imsize):
"""Generate image of given shape scaled 0-1.
shape: str
shape to draw (circle, triangle, rectangle)
imsize: int
size of image
"""
image, _ = random_shapes(
image_shape=(imsize,imsize),max_shapes=1, min_shapes=1,
num_channels=1, channel_axis=None, shape=shape, min_size=8)
#normalize
image = (255-image)/255
# turn into tensor
image_tensor = torch.tensor(image,dtype=torch.float32)
return image_tensor
To create pairs of images and labels we simply create a list of possible shapes and randomly pick values from there. We should also not forget to transform the label into a tensor. Let’s make a single image:
im_type = ['circle', 'triangle','rectangle']
num_cat = len(im_type)
image_size = 32
label = torch.randint(0,num_cat,(1,))
image = make_image(im_type[label], image_size)
fig, ax = plt.subplots()
ax.imshow(image)
ax.set_title(im_type[label]);
We want to train our network using mini-batches, and each batch should have a size N x H x W where N is the batch size and H,W the image dimension. We can create a batch by stacking multiple 2D tensors together:
batch_size = 10
single_batch = torch.stack([make_image('circle', image_size) for x in range(batch_size)])
single_batch.size()
torch.Size([10, 32, 32])
Of course we want to mix the different image types, so we generate labels of size batch_size as well:
label = torch.randint(0,num_cat,(batch_size,))
single_batch = torch.stack([make_image(im_type[x], image_size) for x in label])
fig, ax = plt.subplots(1,4)
for x in range(4):
ax[x].imshow(single_batch[x,:,:])
Finally we can create our full dataset (as a list of tensors) by generating M bachtes in order to have M x N training examples:
# size of training dataset
batch_size = 10
training_size = 5000
validation_size = 100
train_batch_number = int(training_size/batch_size)
validation_batch_number = int(validation_size/batch_size)
training_label = [torch.randint(0,num_cat,(batch_size,)) for i in range(train_batch_number)]
validation_label = [torch.randint(0,num_cat,(batch_size,)) for i in range(validation_batch_number)]
train_batches = [torch.stack([make_image(im_type[x], image_size) for x in lab]) for lab in training_label]
valid_batches = [torch.stack([make_image(im_type[x], image_size) for x in lab]) for lab in validation_label]
Training loop¶
Now we can create a loop where we iterate through our batches to train our network and go through the steps defined above. We will do two loops:
Over epochs: one epoch representing a training step over all batches
Over batches
We do validation only once per epoch to see how training goes.
#del model
model = Mynetwork(1024, 3)
optimizer = optim.Adam(model.parameters(), lr=0.001)
for epoch in range(10):
print(f'epoch: {epoch}')
# initialize running accuracy
running_accuracy = 0
for t in range(train_batch_number):
# get batch
label = training_label[t]
mybatch = train_batches[t]
# calculate predicted label and calculate loss
pred = model(mybatch)
loss = criterion(pred, label)
# backpropagate the loss
loss.backward()
# do the optimization step
optimizer.step()
# set gradients to zero as PyTorch accumulates them otherwise
optimizer.zero_grad()
# calculate accuracy
mean_accuracy = (torch.argmax(pred,dim=1) == label).sum()/batch_size
running_accuracy+=mean_accuracy
every_nth = 1000
if t % every_nth == every_nth-1:
print(f'accuracy: {running_accuracy/every_nth}')
running_accuracy = 0.0
# validation
valid_accuracy = 0
for t in range(validation_batch_number):
# get batch
label = validation_label[t]
mybatch = valid_batches[t]
# calculate predicted label
pred = model(mybatch)
# calculate accuracy
mean_accuracy = (torch.argmax(pred,dim=1) == label).sum()/batch_size
valid_accuracy += mean_accuracy
valid_accuracy = valid_accuracy/validation_batch_number
print(f'valid_accuracy: {valid_accuracy}')
epoch: 0
valid_accuracy: 0.699999988079071
epoch: 1
valid_accuracy: 0.7099999785423279
epoch: 2
valid_accuracy: 0.7699999809265137
epoch: 3
valid_accuracy: 0.8100000619888306
epoch: 4
valid_accuracy: 0.8500000238418579
epoch: 5
valid_accuracy: 0.8600000143051147
epoch: 6
valid_accuracy: 0.8399999737739563
epoch: 7
valid_accuracy: 0.8700000643730164
epoch: 8
valid_accuracy: 0.8399999737739563
epoch: 9
valid_accuracy: 0.8799999952316284
We see that accuracy improves but not to a perfect level. We can try to increase the number of epochs or the training size. We can also try to understand where the problem is. For example we can try to find out which images are most mis-classified using a confusion matrix from scikit-learn.
from sklearn.metrics import confusion_matrix
import pandas as pd
import seaborn as sn
We generate a test batch and obtain a prediction with our model:
label = torch.randint(0,len(im_type),(100,))
mybatch = torch.stack([make_image(im_type[x], image_size) for x in label])
pred = model(mybatch)
Again the maximum index for each element of the batch gives us the final class:
pred.argmax(dim=1)
tensor([1, 1, 1, 0, 0, 2, 0, 2, 2, 1, 0, 0, 2, 2, 1, 2, 2, 0, 2, 2, 0, 1, 0, 0,
0, 2, 0, 0, 2, 1, 1, 1, 2, 0, 1, 0, 1, 1, 0, 0, 0, 2, 2, 1, 2, 2, 0, 0,
2, 0, 2, 2, 1, 0, 2, 2, 1, 0, 1, 2, 1, 0, 2, 0, 0, 1, 1, 2, 0, 0, 0, 2,
0, 2, 1, 0, 1, 1, 1, 2, 0, 1, 2, 1, 2, 1, 1, 1, 1, 0, 2, 1, 0, 1, 2, 0,
0, 0, 1, 2])
label
tensor([1, 1, 1, 0, 0, 2, 0, 2, 2, 1, 0, 0, 2, 0, 0, 2, 2, 0, 2, 2, 0, 1, 0, 0,
2, 2, 0, 0, 2, 1, 1, 1, 2, 0, 1, 0, 1, 1, 0, 0, 0, 2, 2, 1, 2, 2, 0, 0,
2, 0, 2, 2, 1, 0, 2, 2, 1, 0, 1, 2, 1, 0, 2, 2, 0, 1, 1, 2, 0, 2, 0, 2,
0, 2, 1, 0, 1, 1, 1, 2, 0, 1, 2, 1, 2, 1, 1, 1, 1, 0, 2, 1, 0, 1, 2, 0,
0, 0, 0, 2])
Let’s calcualte the confusion matrix and transform it into a Dataframe that we can then easily plot with seaborn:
df_cm = pd.DataFrame(confusion_matrix(pred.argmax(dim=1), label), index = im_type,
columns = im_type)
plt.figure(figsize = (10,7))
sn.heatmap(df_cm, annot=True);
We see that the problem is mostly the circle. Because we use a tiny image, small circles are not smooth and can look like rectangles or triangles.
Exercises¶
Re-use the network you have create for the exercise 4. Make sure it can take batches of 2D images as input.
Create an image generator which creates black (value=0) images (2D tensors) with a single white (value=1) line which is vertical (class #1) or horizontal (class #2)
Create training and validation data generators
Add a training loop and train. Verify that the nework works
“Bonus”: try to add various levels of noise to the image and see how prediction is affected