Plotting with Matplotlib

In the last section we were introduced to Numpy and the fact that it is a numerical library capable of “basic” numerical analyses on Arrays.

We could use Python to analyse data, and then save the result as comma separated values, which are easily imported into e.g. GraphPad or Excel for plotting.

But why stop using Python there?

Python has some powerful plotting and visualization libraries, that allow us to generate professional looking plots in an automated way.

One of the biggest of these libraries is Matplotlib.

Users of Matlab will find that Matplotlib has a familiar syntax.

For example, to plot a 1d array (e.g. stored in variable arr1d) as a line plot, we can use

import matplotlib.pyplot as plt       
plt.plot(arr1d)
plt.show()

Reminder: aliases

Here we used the alias (as) feature when importing matplotlib to save having to type matplotlib.pyplot each time we want to access the pyplot sub-module’s plotting functions!

NOTE

In the following notes, most of the time when I refer to “matplotlib’s X function” (where X is changeable), I’m actually referring to the function found in matplotlib.pyplot, which is matplotlib’s main plotting submodule.

The show function

What’s going on here?

The plot function does just that; it generates a plot of the input arguments.

However, matplotlib doesn’t (by default) show any plots until the show function is called. This is an intended feature:

  • To create multiple plots/figures before pausing the script to show them
  • To not show any plots/figures, instead using a plot saving function
    • this mode of operation is better suited to non-interactive (batch) processing

It is possible to change this feature so that plots are shown as soon as they are created, using matplotlib’s ion function (~ interactive on).

Creating new figures with figure

Often we need to show multiple figures; by default calling several plot commands, one after the other, will cause each new plot to draw over previous plots.

To create a new figure for plotting, call

plt.figure()
# Now we have a new figure that will receive the next plotting command

A short sample of plot types

Matplotlib is good at performing 2d plotting.

As well as the “basic” plot command, matplotlib.pyplot includes

bar             : Bar plot (also barh for horizontal bars)
barb            : 2d field of barbs (used in meteorology)
boxplot         : Box and whisker plot (with median, quartiles)
contour         : Contour plot of 2d data (also contourf - filled version)
errorbar        : Error bar plot
fill_between    : Fills between lower and upper lines
hist            : Histogram (also hist2d for 2 dimensional histogram)
imshow          : Image "plot" (with variety of colour maps for grayscale data)
loglog          : Log-log plot
pie             : Pie chart
polar           : Polar plot
quiver          : Plot a 2d field of arrows (vectors)
scatter         : Scatter plot of x-y data
violinplot      : Violin plot - similar to boxplot but width indicates distribution

Rather than copying and pasting content, navigate to the Matplotlib Gallery page to view examples of these and more plots.

You’ll find examples (including source code!) for most of your plotting needs there.

Exercise : Simple plotting

First of all, let’s practice plotting 1d and 2d data using some of the plotting functions mentioned above.

Create a new script file (“exercise_mpl_simple.py”), and start off by loading and creating some sample data sets:

  • Load the data you generated at the end of the last section (“growth_data.csv”)
  • Create a 2d array of random (you can pick which distribution) noise, of size 200 x 100.

Create the following plots and display them to the screen

  • A 1d line plot of the first column of the growth data (~t) vs the signal
  • A histogram of the t >= 0 part of the signal data (i.e. a graphical version of the binning statistics you printed to the terminal - use matplotlib’s hist function.
  • An image plot (i.e. an image) of the 2d data

Customizing the figure

Most plotting functions allow you to specify additional keyword arguments, which determine the plot characteristics.

For example, a the line plot example may be customized to have a red dashed line and square marker symbols by updating our first code snippet to

plt.plot(arr1d, '--', color="r", markerstyle="s")

(or the shorthand plt.plot(arr1d, "--r"))

Axis labels, legends, etc

Additional figure and axes properties can also be modified. To do so, we need to access figure/axis respectively.

To create a new (empty) figure and corresponding figure object, use the figure function:

fig1 = plt.figure()
# Now we can modify the figure properties, e.g.
# we can set the figure's width (in inches - intended for printing!)
fig1.set_figwidth(10)
# Then setting the figure's dpi (dots-per-inch) will determine the 
# number of pixels this corresponds to... 
fig1.set_dpi(300)       # I.e. the figure width will be 3000 px!

If instead we wish to modify an already created figure object, we can either get a figure object by passing it’s number propery to the figure function e.g.

f = plt.figure(10) # Get figure number 10

or more commonly, we get the active figure (usually the last created figure) using

f = plt.gcf()

Axes are handled in a similar way;

ax = plt.axes()

creates a default axes in the current figure (creates a figure if none are present), while

ax = plt.gca()

gets the current (last created) axes object.

Axes objects give you access to changing background colours, axis colours, and many other axes properties.

Some of the most common ones can be modified for the current axes without needing to access the axes object as matplotlib.pyplot has convenience functions for this e.g.

plt.title("Aaaaarghh")          # Sets current axes title to "Aaaaarghh"

plt.xlabel("Time (s)")          # Sets current axes xlabel to "Time (s)"
plt.ylabel("Amplitude (arb.)")  # Sets current axes ylabel to "Amplitude (arb.)"

# Set the current axes x-tick locations and labels
plt.yticks( numpy.arange(5), ('Tom', 'Dick', 'Harry', 'Sally', 'Sue') )

A note on style

Matplotlib’s default plotting styles are often not considered to be desirable, with critics citing styles such as the R package ggplot’s default style as much more “professional looking”.

Fortunately, this criticism has not fallen on deaf ears, and while it has always been possible to create your own customized style, Matplotlib now includes additional styles by default as “style sheets”.

In particular, styles such as ggplot are very popular and essentially emulate ggplot’s style (axes colourings, fonts, etc).

The style may be changed before starting to plot using e.g.

plt.style.use('ggplot')

A list of available styles may be accessed using

plt.style.available

(on the current machine this holds : ['dark_background', 'fivethirtyeight', 'bmh', 'grayscale', 'ggplot'])

You may define your own style file, and after placing it in a specific folder ( “~/.config/matplotlib/mpl_configdir/stylelib” ) you may access your style in the same way.

Styles may also be composed together using

plt.style.use(['ggplot', 'presentation'])

where the rules in the right-most style will take precedent.

Exercise : trying a new style

Repeat the simple plotting exercise with the "ggplot" style; to do this, copy and paste the code from the simple plotting exercise but add the code to change the style to the ggplot style before the rest of the code.

3d plotting

While matplotlib includes basic 3d plotting functionality (via mplot3d), we recommend using a different library if you need to generate “fancy” looking 3d plots.

A good alternative for such plots isa package called mayavi.

However this is not included with WinPython and the simplest route to installing it involves installing the Enthough Tool Suite (http://code.enthought.com/projects/).

For the meantime though, if you want to plot 3d data, stick with the mplot3d submodule.

For example, to generate a 3d surface plot of the 2d data (i.e. the “pixel” value would correspond to the height in z), we could use

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z)

where X, Y, Z are 2d data values.

X and Y are 2d representations of the axis values which can be generated using utility functions, e.g. pulling an example from the Matplotlib website,

from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
ax = fig.gca(projection='3d')
X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm,
                       linewidth=0, antialiased=False)
ax.set_zlim(-1.01, 1.01)

ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))

fig.colorbar(surf, shrink=0.5, aspect=5)

plt.show()

3d plot demo

Saving plots

Now that we have an idea of the types of plots that can be generated using Matplotlib, let’s think about saving those plots.

If you call the show command, the resulting figure window includes a basic toolbar for zooming, panning, and saving the plot.

However, as a Pythonista, you will want to automate saving the plot (and probably forego viewing the figures until the script has batch processed your 1,000 data files!).

The command to save the currently active figure is

plt.savefig('filename.png')

Alternatively, if you store the result of the plt.figure() function in a variable as mentioned above you can subsequently use the savefig member-function of that figure object to specifically save that figure, even if it is no longer the active figure; e.g.

fig1 = plt.figure()

#
# Other code that generates more figures
#
# ...
#


fig1.savefig('figure1.pdf')

Note on formats: Raster vs Vector

The filename extension that you specify in the savefig function determines the output type; matplotlib supports

  • png - good for images!
  • jpg (not recommended except for drafts!)
  • pdf - good for (vector) line plots
  • svg

as well as several others.

The first two formats are known as raster formats, which means that the entire figure is saved as a bunch of pixels.

This means that if you open the resulting file with an image viewer, and zoom in, eventually the plot will look pixelated/blocky.

The second two formats are vector formats, which means that a line plot is saved as a collection of x,y points specifying the line start and end points, text is saved as a location and text data and so on.

If you open a vector format file with e.g. a PDF file viewer, and zoom in, the lines and text will always have smooth edges and never look blocky.

As a general guideline, you should choose to mainly use vector file (pdf, eps, or svg) output for plots as they will preserve the full data, while raster formats will convert the plot data into pixels.

In addition, if you have vector image editing software like Inkscape or Adobe Illustrator, you will be able to open the vector image and much more easily change line colours, fonts etc.

Other annotations

We already saw how to add a title and x and y labels; below are some more examples of annotations that can be added.

You could probably guess most of what the functions do (or have guessed what the function would be called!):

plt.title("I'm a title") # Note single quote inside double quoted string is ok!
plt.xlabel("Time (${\mu}s$)")       # We can include LaTex for symbols etc... 
plt.ylabel("Intensity (arb. units)")

plt.legend("Series 1", "Series 2")

plt.colorbar()      # Show a colour-bar (usually for image data). 

# Add in an arrow
plt.arrow(0, 0, 0.5, 0.5, head_width=0.05, head_length=0.1, fc='k', ec='k')

The last command illustrates how as with most most plotting commands, arrow accepts a large number of additional keyword arguments (aka kwargs) to customize colours (edge and face), widths, and many more properties.

Exercise : comparing raster and vector formats

Create a new script file (exercise_mpl_raster_vs_vector.py) and create a simple line plot of y=x^2 between x=-10 and x=10.

Add in labels for the x and y axis; call the x axis, "x (unitless)" and the y axis “y = x^2” (see if you can figure out how to render the squared symbol, i.e. a super-script 2; those of you familiar with LaTex should find this easy!).

Save the plot as both a png and pdf file, and open both side-by-side. Zoom in to compare the two formats.

Sub-plots

There are two common approaches to creating sub-plots.

The first is to create individual figures with well thought out sizes (especially font sizes!), and then combine then using an external application.

Matplotlib can also simplify this step for us, by providing subplot functionality. We could, when creating axes, reposition them ourselves in the figure. But this would be tedious, so Matplotlib offers convenience functions to automatically lay out axes for us.

The older of these is the subplot command, which lays axes out in a regular grid.

For example

plt.subplot(221)        # Create an axes object with position and size for 2x2 grid
plt.plot([1,2,3])
plt.title("Axes 1")

plt.subplot(222)    
plt.plot([1,2,3])   
plt.title("Axes 2")

plt.subplot(223)    
plt.plot([1,2,3])
plt.title("Axes 3")

plt.subplot(224)    
plt.plot([1,2,3])
plt.title("Axes 4") 

Note that the subplot command uses 1-indexing of axes (not 0-indexing!). The above could be simplified (at least in some scenarios) using a for-loop, e.g.

# datasets contains 4 items, each corresponding to a 1d dataset... 
for i, data in enumerate(datasets):
    plt.subplot(2,2,i+1)        
    plt.plot(data)
    plt.title("Data %d"%(i+1))

Recently, Matplotlib added a new way of generating subplots, which makes it easier to generate non-uniform grids.

This is the subplot2grid function, which is used as follows,

ax1 = plt.subplot2grid((3,3), (0,0), colspan=3)
ax2 = plt.subplot2grid((3,3), (1,0), colspan=2)
ax3 = plt.subplot2grid((3,3), (1, 2), rowspan=2)
ax4 = plt.subplot2grid((3,3), (2, 0))
ax5 = plt.subplot2grid((3,3), (2, 1))

would generate grid plot (without the labels!)

Exercise : Adding subplots and insets

Download the image from here: Image File.

Create a new script (“exercise_mpl_gridplots.py”), and

  1. Load the image using matplotlib.pyplot’s imread convenience function
  2. The image may be loaded into a 3d array - RGB(A) - if so, convert it to grayscale my taking the mean of the first 3 colour channels
  3. Create a grid of 3 plots, with the following shape
2x6
---------
| c 8x2 |
------------
|       || |
|   b   ||a|
|  8x6  || |
|       || |
------------

Here,

  • a is a maximum projection onto the y axis (2x8 panel)
  • b is the full grayscale image (8x6 panel)
  • c is a maximum projection onto the x axis (8x2 panel)

Next create a similar subplot figure but this time, instead of performing maximum projections along the x and y axes, generate the following mean line-profiles:

  • A profile along a horizontal line at y=147 averaged over a of width 5 pixels (i.e. averaging from y=145 to 149 inclusive)
  • A profile along a vertical line at x=212 averaged over a width of 5 pixels (i.e.