Open In Colab

3. Tensor calculations

We have seen in the previous cahtper that we need to use a specific type of array (matrix) in the frame of PyTorch based neural networks, namely tensors. These tensors are very similar to Numpy arrays but with additional specific functionalities needed for deep learning. There is an ongoing effort to make the switch between different array/tensor formats (numpy, tensors, xarray etc.) more transparent in the future, but for the moment let’s briefly explore the PyTorch tensors, accessible from the torch module:

import torch
import numpy as np
import matplotlib.pyplot as plt

Creating arrays

Numpy and Pytorch share a lot of functions and methods so you wont feel completely lost. For example you can create arrays filled with ones:

t_array = torch.ones((3,2))
t_array
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
n_array = np.ones((3,2))
n_array
array([[1., 1.],
       [1., 1.],
       [1., 1.]])

You can also find the type of the array with dtype:

print(f't_array dtype: {t_array.dtype}')
print(f'n_array dtype: {n_array.dtype}')
t_array dtype: torch.float32
n_array dtype: float64

Pytorch implements as well many other function to create arrays that are very similar to Numpy. For example random number arrays:

t_random = torch.randint(0,255,(10,10))

Finally you can easily transform Numpy arrays into Pytorch tensors:

t_from_n = torch.tensor(n_array)
t_from_n
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], dtype=torch.float64)

And the reverse is true as well: you can recover a Numpy array from a Pytorch tensor:

t_from_n.numpy()
array([[1., 1.],
       [1., 1.],
       [1., 1.]])

Finally, Pytorch tensors are also compatible with Matplotlib, so you can easily have a look at them using e.g. imshow for 2D tensors:

plt.imshow(t_random);
../_images/03-Tensors_14_0.png

Indexing, broadcasting etc.

The powerful logic behind Numpy that allows for a very efficient selection and combination of elements in arrays is also conserved in Pytorch. For example regular indexing:

t_random
tensor([[140, 120,  92, 222, 238, 117, 239,  31, 244,  94],
        [ 38, 105,  49,  43,  73,  40, 107,  35, 253, 169],
        [  6,  40, 119, 192,  57, 124, 153,  36,  89, 149],
        [238, 235,  21, 147,  44, 109, 251, 112, 159, 192],
        [ 45, 192, 224, 233, 175,  86, 152, 110, 183,  63],
        [135,  80,   7, 147, 140, 123, 227, 112,  39, 112],
        [100, 249, 114,  93, 225,   9, 238, 164, 164, 156],
        [123, 125, 226,   7, 209,  20,  60, 142,  45,  40],
        [ 57, 114,  28, 143,  27, 204, 187,  84, 219, 126],
        [ 25,  65, 144, 200,  41,  22, 225, 153, 215,  31]])
t_random[0,:]
tensor([140, 120,  92, 222, 238, 117, 239,  31, 244,  94])

or broadcasting that allows to combine tensors of different but compatible shapes:

torch.ones((3,5)) * torch.randint(0,255, (1,5))
tensor([[193., 242., 134.,  51., 199.],
        [193., 242., 134.,  51., 199.],
        [193., 242., 134.,  51., 199.]])

We will see that very often we also need to flatten arrays for example to create a fully connected layer in a deep learning network. This can be done in two ways. You can use the flatten function/method:

t_random.flatten()
tensor([140, 120,  92, 222, 238, 117, 239,  31, 244,  94,  38, 105,  49,  43,
         73,  40, 107,  35, 253, 169,   6,  40, 119, 192,  57, 124, 153,  36,
         89, 149, 238, 235,  21, 147,  44, 109, 251, 112, 159, 192,  45, 192,
        224, 233, 175,  86, 152, 110, 183,  63, 135,  80,   7, 147, 140, 123,
        227, 112,  39, 112, 100, 249, 114,  93, 225,   9, 238, 164, 164, 156,
        123, 125, 226,   7, 209,  20,  60, 142,  45,  40,  57, 114,  28, 143,
         27, 204, 187,  84, 219, 126,  25,  65, 144, 200,  41,  22, 225, 153,
        215,  31])

Here you can also specify which contiguous dimensions you want to flatten e.g.:

t_3d = torch.randint(0,100,(2,3,4))
t_3d
tensor([[[23, 67, 96, 77],
         [85, 53, 13, 83],
         [91,  8, 23, 88]],

        [[96, 16,  1, 18],
         [96, 96,  6, 55],
         [19,  7, 97, 60]]])
torch.flatten(t_3d, start_dim=1, end_dim=2)
tensor([[23, 67, 96, 77, 85, 53, 13, 83, 91,  8, 23, 88],
        [96, 16,  1, 18, 96, 96,  6, 55, 19,  7, 97, 60]])

The alternative is to use the view method, which, if possible, returns only a view of the array. You can pass compatible dimensions to reshape the tensor, or simple use -1 to completely flatten it.

t_random = torch.randint(0,255,(10,10))
t_random.view(5, 20)
tensor([[130, 134,  63,  67,  49, 141, 235, 237, 178, 150, 249, 157,  69, 116,
           6, 142, 129, 109,   3, 214],
        [ 34, 219,  23,  11,  48, 229,  65, 220,  87, 139,  68, 211, 130, 197,
         198,  70, 218, 208, 200,  62],
        [ 32, 200, 175, 240, 199,  63, 179, 228, 248, 206, 244, 229,  49, 163,
          69, 170, 226,  98,  49,  84],
        [203, 185, 208,  76, 132,  39, 244, 142, 175, 132, 180,  59,  38, 126,
         216, 253,  70, 231, 129, 202],
        [ 10,  21, 221, 126, 215, 206, 216, 211, 245, 168, 159,   9,  54,  49,
          78, 228,  41, 180,  17, 110]])
t_random.view(-1)
tensor([130, 134,  63,  67,  49, 141, 235, 237, 178, 150, 249, 157,  69, 116,
          6, 142, 129, 109,   3, 214,  34, 219,  23,  11,  48, 229,  65, 220,
         87, 139,  68, 211, 130, 197, 198,  70, 218, 208, 200,  62,  32, 200,
        175, 240, 199,  63, 179, 228, 248, 206, 244, 229,  49, 163,  69, 170,
        226,  98,  49,  84, 203, 185, 208,  76, 132,  39, 244, 142, 175, 132,
        180,  59,  38, 126, 216, 253,  70, 231, 129, 202,  10,  21, 221, 126,
        215, 206, 216, 211, 245, 168, 159,   9,  54,  49,  78, 228,  41, 180,
         17, 110])

Since we are dealing with a view, if we modify one of the arrays in place, the values in the other arrays are changed as well. This means that this is not and independent array but just a shallow-copy. Therefore be careful.

view_copy = t_random.view(5,20)
view_copy
tensor([[130, 134,  63,  67,  49, 141, 235, 237, 178, 150, 249, 157,  69, 116,
           6, 142, 129, 109,   3, 214],
        [ 34, 219,  23,  11,  48, 229,  65, 220,  87, 139,  68, 211, 130, 197,
         198,  70, 218, 208, 200,  62],
        [ 32, 200, 175, 240, 199,  63, 179, 228, 248, 206, 244, 229,  49, 163,
          69, 170, 226,  98,  49,  84],
        [203, 185, 208,  76, 132,  39, 244, 142, 175, 132, 180,  59,  38, 126,
         216, 253,  70, 231, 129, 202],
        [ 10,  21, 221, 126, 215, 206, 216, 211, 245, 168, 159,   9,  54,  49,
          78, 228,  41, 180,  17, 110]])
view_copy.fill_(1)
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
t_random
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

Gradients

To be able to perform backpropagation in Deep Learning networks, we need to be able to calculate all the necessary gradients. This feature is “integrated” into Pytorch arrays directly if we use the requires_grad option. To start with a simple example, let’s define first a variable \(x=1\):

x = torch.ones(1, 1, requires_grad=True)
x
tensor([[1.]], requires_grad=True)

Now we let our variable pass through a few simple operations:

y = 2 * x
z = y**(3/2)
w = 5 * z

Our last variable that depends initially on x is now w. We see that \(w = f(z) = f(g(y)) = f(g(h(x))) = k(x)\) with:

\(f(z) = 5*z\)

\(g(y) = y^{3/2}\)

\(h(x) = 2*x\)

If w needs to be optimized with respect to the variablex x, following th chain rule, we need to calculate \(k'(x) = f'(g(h(x))*g'(h(x))*h'(x)\)

\(5 * \frac{3}{2}(2x)^{0.5} * 2\)

This complete calculation can simply be performed by calcualting the gradients of w \(dw/dx\):

w.backward()
print(x.grad)
tensor([[21.2132]])

We can verify that we indeed obtain the correct gradient:

5 * (3/2)*(2**0.5) * 2
21.213203435596427

Of course this is an over-simplified example. Calculations become more complex when dealing with actual vectors or tensors but the principle remains the same.

Finally note that if you want to recover a Numpy array from a PyTorch tensor, or plot a PyTorch tensor with Matplotlib, you first have to detach it from the gradient calculation system (if necessary) to recover it:

x.numpy()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-30-2527552080a3> in <module>()
----> 1 x.numpy()

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
x.detach().numpy()
array([[1.]], dtype=float32)

Sending tensors to a GPU

If your computer is equipped with a compatible GPU or if you run the notebook on Google Colab with a GPU runtime, you can exploit Graphics card computing power. For that the data have to be “pushed” and “pulled” to and from that device. We will see later that we can push entire networks thre but for the moment we just send a tensor.

First we have to check wheter a GPU is available:

torch.cuda.is_available()
True

If yes we can device a GPU device (a CUDA device in fact):

dev = torch.device("cuda")
dev
device(type='cuda')

Finally we can send the data the the “CUDA” device:

mytensor = torch.randn((3,5))
mytensor = mytensor.to(dev)
mytensor
tensor([[ 0.1078, -1.2955,  0.9824, -2.2888, -0.9240],
        [ 1.1793, -1.8081, -1.4458, -1.2130,  0.9562],
        [-0.5964,  1.9136, -1.2986, -0.1035,  1.0607]], device='cuda:0')
mytensor.numpy()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-41-aa7480be9066> in <module>()
----> 1 mytensor.numpy()

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

We see here that we have again difficulties getting the tensor “out” of PyTorch. This time not because it’s part of a gradient but because it lives on the GPU. So we need to first copy it back to the CPU first:

mytensor_CPU = mytensor.cpu()
mytensor_CPU.numpy()
array([[ 0.10778594, -1.2954801 ,  0.98242337, -2.2888114 , -0.9239933 ],
       [ 1.1793374 , -1.8080784 , -1.4457537 , -1.2130216 ,  0.9561631 ],
       [-0.59640384,  1.9136024 , -1.2985845 , -0.10350052,  1.0606741 ]],
      dtype=float32)

You will regularly hit this kind of problems when writing your code, so remember these two potential issues when you want to post-process some tensor:

  • you migth need to detach it from the gradient calculation

  • you migth need to pull it out of the GPU

  • for NN computation, you might need to push your data (tensors) to the GPU

Exercises

  1. Create a tensor of integers in the range 0-100 of size 16x16

  2. Change its “gradient-status” by attaching it to gradient calculation

  3. Solve the problem appearing in (2.) by creating a float32 tensor and attaching the gradient again

  4. Flatten the array to 1d

  5. Transform your flat tensor to a numpy array