13. Matplotlib: adjusting non-data elements#

We will see later on other types of plots that we can generate, but before that, we want to explore how we can adjust the myriad of elements of a plot: titles, axis, ticks etc.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
diams = pd.read_csv('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/Ecdat/Diamond.csv')

def parabola(x, a, b, c):
    return a * x**2 + b*x + c
fit_params, _ = scipy.optimize.curve_fit(parabola, diams.carat, diams.price)

Modify the axis#

The axis properties can be modified via methods attached to the ax object. For example we can set limits:

fig, ax = plt.subplots()
ax.plot(diams.carat, diams.price, 'ro', alpha=0.1);
ax.plot(np.arange(0,1.5,0.1), parabola(np.arange(0,1.5,0.1), *fit_params), linestyle='-.', color='green');
ax.set_xlim(left=0.1, right=1.2)
ax.set_ylim(bottom=1, top=20000);
_images/a500002e8a53844d100806cedf07b67c3c507f7f4183e12c231f712f046a2433.png

We can also change the axis type itself. For example we can turn our plot into a log plot by changing the type of the y-axis:

fig, ax = plt.subplots()
ax.plot(diams.carat, diams.price, 'ro', alpha=0.1);
ax.plot(np.arange(0,1.5,0.1), parabola(np.arange(0,1.5,0.1), *fit_params), linestyle='-.', color='green');
ax.set_yscale('log');
_images/bb74ca90bf1d42dee6baa019d6bcebf2b95ecfbbd3039f58a9c7295c3c484dce.png

Adding labels#

One should always use labels for all axis. We can do this with the set_xlabel and set_ylabel methods:

ax.set_xlabel('carat')
ax.set_ylabel('price')
fig
_images/ca25eedde580b0d425f88172a03898bbed0e72af84a0a1c5161f6495041022fc.png

Avoid hard-coding#

Ideally you should try to hard-code too much information throughout your code. For example, you might produce several plots with the same data, and if you decide to change the labelling you want to avoid having to manually update the label in each single plot. Here’s a possible solution: you could create a dictionary that maps a column name to a label. Whenever you want to use a specific column you then just call that dictionary for labelling!

label_dict = {'carat': 'Weight [ct]', 'price': 'Price [USD]'}

Now when we want to create a plot, we can set everything at the beginning:

x_axis = 'carat'
y_axis = 'price'
fig, ax = plt.subplots()
ax.plot(diams[x_axis], diams[y_axis], 'ro', alpha=0.1);
ax.plot(np.arange(0,1.5,0.1), parabola(np.arange(0,1.5,0.1), *fit_params), linestyle='-.', color='green');
ax.set_xlabel(label_dict[x_axis])
ax.set_ylabel(label_dict[y_axis]);
_images/49467f0f800951452d806dadf25aa9833096f472a1fc18e13350180adaa3d714.png

More advanced labels#

Sometimes you want to use variables in your label, so that the label adjusts exactly to your current settting. For example, let’s imagine that you want to scale your y axis by some factor. You’d like this to be represented in your label. There are different solutions to achieve this but the most “modern” one is to use f-strings

factor = 1000
fig, ax = plt.subplots()
ax.plot(diams[x_axis], diams[y_axis]/factor, 'ro', alpha=0.1);
ax.plot(np.arange(0,1.5,0.1), parabola(np.arange(0,1.5,0.1), *fit_params)/factor, linestyle='-.', color='green');
ax.set_xlabel(label_dict[x_axis])
ax.set_ylabel(f'{label_dict[y_axis]} / {factor}');
_images/45aeb3b1ef037a5ab53f9be1e7527cc54658bee7585808333f0500df02cba00a.png

Finally if you are familiar with \(\LaTeX\), you can also use it in labels, as well as for any other text. For example let’s imagine we dealy with micrograms in the x axis, we could use:

w_factor = 10**6
fig, ax = plt.subplots()
ax.plot(diams[x_axis]*w_factor, diams[y_axis], 'ro', alpha=0.1);
ax.plot(np.arange(0,1.5,0.1)*w_factor, parabola(np.arange(0,1.5,0.1), *fit_params), linestyle='-.', color='green');
ax.set_xlabel('Weight [$\mu g$]')
ax.set_ylabel(label_dict[y_axis]);
_images/d25af99ea8c3b3b99f50a4fa425435e7bf628beeb2051aa433401a37430e8355.png

Adding a title#

You can easily add a title to your plot using the set_title() function. It has the same properties as the labels, so you can use f-strings, \(\LaTeX\) etc. As a general rule, beware of not repeating in the title what is alredy shown in the axis labels:

fig, ax = plt.subplots()
ax.plot(diams[x_axis], diams[y_axis], 'ro', alpha=0.1);
ax.plot(np.arange(0,1.5,0.1), parabola(np.arange(0,1.5,0.1), *fit_params), linestyle='-.', color='green');
ax.set_xlabel(label_dict[x_axis])
ax.set_ylabel(label_dict[y_axis]);
ax.set_title('Figure 1');
_images/c94d9c3f27c7c4505bbfa6d3a63d4612df541e1862064b8390b144ba63e17925.png

One function to set them all#

Instead of calling all the different function - set_xlabel, set_title etc. you can also use the set_ax function and pass your text as parameters:

fig, ax = plt.subplots()
ax.plot(diams[x_axis], diams[y_axis], 'ro', alpha=0.1);
ax.plot(np.arange(0,1.5,0.1), parabola(np.arange(0,1.5,0.1), *fit_params), linestyle='-.', color='green');
ax.set(xlim=(0.1, 1.2), ylim=(1, 20000), xlabel=label_dict[x_axis], ylabel=label_dict[y_axis], title='Figure 1');
_images/e5cab53ae45db127c4c641b27e8ba0fd41e870e67633d6417f29c464401d1a60.png

Adjusting the text itself#

There are multiple ways to adjust the text in a figure. The first solution is to adjust the parameters for element in the plot using a list of possible options such as size or weight:

fig, ax = plt.subplots()
ax.plot(diams[x_axis], diams[y_axis], 'ro', alpha=0.1);
ax.plot(np.arange(0,1.5,0.1), parabola(np.arange(0,1.5,0.1), *fit_params), linestyle='-.', color='green');
ax.set_xlabel(label_dict[x_axis], size=12)
ax.set_ylabel(label_dict[y_axis], weight=1000, color='red');
ax.set_title('Figure 1', family='monospace', size=20);
_images/847c1ea094728f32a9384fe2a8c1182ba7b50bc128ce188764a585ff6d41f8dc.png

Changing tick marks#

The last element that we have not updated yet are the tick marks. First, let’s reset the defaults (we’ll come back to this later):

plt.style.use('default')

The main way to update tick marks is to use the tick_params function. There you have a series of options (with expected names such as color, labelsize etc) to set everything. For example:

fig, ax = plt.subplots()
ax.plot(diams[x_axis], diams[y_axis], 'ro', alpha=0.1);
ax.tick_params(colors='red',labelsize=15, top=True, labelright=True)
_images/f653c5254489bdf1a139677ed2aebd9aa8368b67b151b30c3fbd0cd95e756894.png

As you can see our options were applied to all tickmarks. We can howeve specify which axis we want to affect usng the axis option:

fig, ax = plt.subplots()
ax.plot(diams[x_axis], diams[y_axis], 'ro', alpha=0.1);
ax.tick_params(axis='x', colors='red',labelsize=15, top=True, labelright=True)
ax.tick_params(axis='y', colors='green',labelsize=15)
_images/0a505fca91082364a0ff40400f9277e1407b18e7c9887af68b0c3408b50a23f0.png

Obviously, as we specified that we wanted to affect the x axis, the labelright option is not taken into account.

Grid lines#

Finally we can also add grid lines to our plot. This is achieved via the grid function:

fig, ax = plt.subplots()
ax.plot(diams[x_axis], diams[y_axis], 'ro', alpha=0.1);
ax.tick_params(axis='x', colors='red',labelsize=15, top=True, labelright=True)
ax.tick_params(axis='y', colors='green',labelsize=15)
ax.grid(which='major', axis='x', linewidth=3, color='pink')
ax.grid(which='major', axis='y', linewidth=1, color='blue')
_images/66e6ddb32bafa07927a4ee1c361efeaee8720d00eca2b326de73aea22848d83b.png

General templates#

Instead of specifying everything manually, you can also use pre-made style templates. You can find a complete list here. Then you can use this command with the appropriate style name:

plt.style.use('ggplot')
fig, ax = plt.subplots()
ax.plot(diams[x_axis], diams[y_axis], 'ro', alpha=0.1);
ax.plot(np.arange(0,1.5,0.1), parabola(np.arange(0,1.5,0.1), *fit_params), linestyle='-.', color='green');
ax.set_xlabel(label_dict[x_axis])
ax.set_ylabel(label_dict[y_axis]);
ax.set_title('Figure 1');
_images/d88bcc7ab61274320c85ae42c54a38d0ad7bb5730cc8686b2b4454428db95ea7.png

Saving the figure#

You can easily save your figure in common formats like PNG and JPG wit the savefig function attached to the fig object. As usual, there are many options to use. One of the most important ones being the resolution dpi:

fig, ax = plt.subplots()
ax.plot(diams[x_axis], diams[y_axis], 'ro', alpha=0.1);
ax.set_xlabel(label_dict[x_axis], size=25)

fig.savefig('myfigure.png', dpi=50)
_images/9d56cb9fa7552bc97eb7e5258baf78ed269250c23718239f35886fcfda13f63a.png

Note that in some cases, depending on the size of your labels, Matplotlib might cut-off some of the elements on the edges (as happens above). To make sure that this doesn’t happen and that you have minimal white space around your figure, you can use the very useful tight_layout:

fig, ax = plt.subplots()
ax.plot(diams[x_axis], diams[y_axis], 'ro', alpha=0.1);
ax.set_xlabel(label_dict[x_axis], size=25)
plt.tight_layout()
fig.savefig('myfigure2.png', dpi=50)
_images/11bc1f2f36c8166da1c703e108a5fd0211d74126526f17a66a90549a40585eb6.png

Exercise#

  1. Load the penguins data https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv

  2. Using one of the scatter plots of body_mass_g as a function of bill_depth_mm (separated by specied or not), truy to customize the plot to match the example below.

  3. Choose a style and apply it to your plot.