Read a more detailed version of this post on my personal blog by clicking here
The quintessential Python library for data visualization is Matplotlib. It’s easy to use, flexible, and a lot of other visualization libraries build on the shoulders of Matplotlib. This means that learning Matplotlib will make it easier to understand and work with some of the more fancy visualization libraries.
Getting started
You’ll need to install the Matplotlib library. Assuming you have some terminal at your disposal and you have pip installed, you can install Matplotlib with the following commaned: pip install matplotlib
. You can read more about the installation in Matplotlib’s installation guide.
Object-oriented approach
We’ll begin by making a simple scatter chart. To start with we have to import matplotlib though. The plt
framework is what we’ll use for Python plotting.
import matplotlib.pyplot as plt
import numpy as np
We also import numpy, so we can easily generate points to plot! Let’s pick some points on the sine function. We choose some x-values and then calculate the y-values with np.sin
.
x = np.linspace(-3, 3, num=10)
y = np.sin(x)
Now that we’ve generated our points, we can make our scatter chart! We start by making a Figure
object and an Axes
object.
fig = plt.figure()
ax = fig.add_subplot()
We can think of the Figure
object as the frame, we want to put plots into, and the Axes
object is an actual plot in our frame. We then add the scatter chart to the Axes
object and use plt.show()
to visualize the chart.
ax.scatter(x, y)
plt.show()
This is the gist of it!
Line charts
Here are examples of colours that we can use. We can specify colours in many different ways; hex code, RGB, plain old names.
from scipy.stats import norm
x = np.linspace(-4, 4, num=100)fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()ax.plot(x, norm.pdf(x, loc=-1, scale=1), color="magenta")
ax.plot(x, norm.pdf(x, loc=0, scale=1), color=(0.85, 0.64, 0.12))
ax.plot(x, norm.pdf(x, loc=1, scale=1), color="#228B22")plt.show()
There are also many predefined linestyles that we can use. Note that without defining colours, Matplotlib will automatically choose some distinct default colors for our lines.
x = np.linspace(-6, 6, num=100)fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()ax.plot(x, norm.pdf(x, loc=-3, scale=1), linestyle="solid")
ax.plot(x, norm.pdf(x, loc=-1, scale=1), linestyle="dotted")
ax.plot(x, norm.pdf(x, loc=1, scale=1), linestyle="dashed")
ax.plot(x, norm.pdf(x, loc=3, scale=1), linestyle="dashdot")plt.show()
We can also adjust the width of our lines!
x = np.linspace(-2, 9, num=100)fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()for i in range(1,7):
ax.plot(
x, norm.pdf(x, loc=i, scale=1), color="black", linewidth=i/2
)plt.show()
Scatter charts
For scatter charts, we can change the markers and their size. Here’s an example
x = np.linspace(-4, 4, num=20)
y1 = x
y2 = -y1
y3 = y1**2fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()ax.scatter(x=x, y=y1, marker="v", s=1)
ax.scatter(x=x, y=y2, marker="X", s=5)
ax.scatter(x=x, y=y3, marker="s", s=10)plt.show()
We can also combine line and scatter charts using the ax.plot
function by changing the fmt
parameter. The fmt
parameter consists of a part for marker, line, and color: fmt = [marker][line][color]
. If fmt = "s--m"
, then we have square markers, a dashed line, and they’ll be coloured magenta.
x = np.linspace(-2, 2, num=20)
y = x ** 3 - xfig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()ax.plot(x, y, 'H-g')plt.show()
Histograms
We can make histograms easily using the ax.hist
function.
x = np.random.randn(10000)fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()ax.hist(x)plt.show()
We can change a lot of things in the histogram to make it nicer — we can even add multiple!
x1 = np.random.randn(10000)-1
x2 = np.random.randn(10000)+1fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()ax.hist(
x1,
color='turquoise',
edgecolor='none',
bins=50,
alpha=0.5,
density=True
)
ax.hist(
x2,
color='magenta',
edgecolor='none',
bins=200,
alpha=0.5,
density=True
)plt.show()
Legends
Naturally, we’ll want to add a legend to our chart. This is simply done with the ax.legend
function.
x = np.linspace(-2, 2, num=100)
y1 = x
y2 = x**2fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()ax.plot(x, y1, color='turquoise', label='First')
ax.plot(x, y2, color='magenta', label='Second')ax.legend()plt.show()
Matplotlib will automatically try and find the best position for the legend on your chart, but we can change it by providing an argument for the loc
parameter. Also, a common preference is to not have a frame around the legend, and we can disable it by setting the frameon
parameter to False
. Additionally, Matplotlib lists the elements of the legend in one column, but we can provide the number of columns to use in the ncol
parameter.
x = np.linspace(-2, 2, num=100)
y1 = x
y2 = np.sin(x)+np.cos(x)
y3 = x**2fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()ax.plot(x, y1, color='turquoise', label='First')
ax.plot(x, y2, color='magenta', label='Second')
ax.plot(x, y3, color='forestgreen', label='Third')ax.legend(loc='lower center', frameon=False, ncol=3)plt.show()
Final tips
There are so many quirks and different things you can do with Matplotlib, and unfortunately I cannot provide them all here. However, a few guidelines to get you started:
- You save charts with the
plt.savefig()
function. - There are a bunch of libraries that build on the shoulders of Matplotlib that could be beneficial to the specific chart you’re trying to create, e.g. Seaborn, Bokeh, Plotly, and many more.
- Look at the gallery. Please, please, look at the gallery! Don’t waste 3 hours working on a chart, if someone has already made it.