Scipy : Additional scientific computing modules

Numpy is one of the core modules of the Scipy “ecosystem”; however, additional numerical routines are packages in the separate scipy module.

The sub-modules contained in scipy include:

  • scipy.cluster : Clustering and cluster analysis
  • scipy.fftpack : Fast fourier transform functions
  • scipy.integrate : Integration and ODE (Ordinary Differential Equations)
  • scipy.interpolate : Numerical interpolation including splines and radial basis function interpolation
  • scipy.io : Input/Output functions for e.g. Matlab (.mat) files, wav (sound) files, and netcdf.
  • scipy.linalg : Linear algebra functions like eigenvalue analysis, decompositoins, and matrix equation solvers
  • scipy.ndimage : Image processing functions (works on high dimensional images - hence nd!)
  • scipy.optimize : Optimization (minimize, maximize a function wrt variables), curve fitting, and root finding
  • scipy.signal : Signal processing (1d) including convolution, filtering and filter design, and spectral analysis
  • scipy.sparse : Sparse matrix support (2d)
  • scipy.spatial : Spatial analysis related functions
  • scipy.statistical : Statistical analysis including continuous and discreet distributions and a slew of statistical functions

While we don’t have time to cover all of these submodules in detail, we will give a introduction to their general usage with a couple of exercise and then a sample project using a few of them to perform some 1d signal processing.

Exercise : Curve fitting

Curve fitting represents a relatively common research task, and fortunatley Python includes some great curve fitting functionalities.

Some of the more advanced ones are present in additional Scikit libraries, but Scipy includes some of the most common algorithms in the optimize submodule, as well as the odr submodule, which covers the more elaborate “orthogonal distance regression” (as opposed to ordinary least squares used in curve_fit and leastsq).

Load the data contained in the following file

curve_fitting_data_simple

and use the scipy.optimize.curve_fit or scipy.optimize.leastsq (least-squares) fitting function (curve_fit calls leastsq but doesn’t give as much output!) to generate a line of best fit for the data. You shold evaluate several possible functions for the fit including

  • A straight line f(x,a,b) = a*x + b
  • A quadratic fit f(x,a,b,c) = a*x^2 + b*x + c
  • An exponential fit f(x,a,b,c) = a*e^(b*x) + c

Generate an estimate of the goodness of fit to discriminate between these possible models.

Bonus section

Try the same again with the following data (you should be able to just change the filename!)

curve_fitting_data_noisy

You should find that the goodness of fit has become worse. In fact the algorithm used by Scipy’s optimize submodule is not always able to determine which of the models fits best.

Try using the scipy.odr submodule instead; a guide for how to do this can be found here

Exercise : Interpolating data

Another common use of Scipy is signal interpolation.

Background : Interpolation

Interpolation is often used to resample data, generating data points at a finer spacing than the original data.

A simple example of interpolation is image resizing; when a small image is resized to a larger size, the “in-between” pixel values are generated by interpolating their values based on the nearest original pixel values.

Typical interpolation schemes include nearest, linear, and spline interpolation of various degrees. These terms denote the functional form of the interpolating function.

  • nearest neighbor interpolation assigns each interpolated value the value of the nearest original value
  • linear interpolation assigns each interpolated value the linearly weighted combination of the neighbouring values; e.g. if the value to be interpolated is 9 times closer to original point B than original point A, then the final interpolated value would be 0.1A + 0.9B.
  • Spline interpolation approximates the region between the original data points as quadratic or cubic splines (depending on which spline interpolation order) was selected.

Download the data from here:

data_interpolation

and use “linear” interpolation to interpolate the data to increase so that the new “x axis” has five times as many points between then original limits (i.e. you’ll resample the data to 5 times the density).

Bonus

Run the interpolation with “quadratic” and “cubic” interpolation also (it helps if your interpolation code was in a function!), and you should see much better results!

Guided exercise: Signal analysis

As you may not have a background in signal processing, as our last exercise, we’ll you through the process instead of leaving you to generate the code from scratch.

Download the wav file (uncompressed audio) from here Wav File.

This sound file will be the basis of our experimentation with some of the additional Scipy modules

Loading data : scipy.io

scipy.io contains file reading functionality to load Matlab .mat files, NetCDF, and wav audio files, amongst others.

We will use the wavfile submodule to load our audio file;

import scipy.io.wavfile as scwav
data, rate = scwav.read("audio_sample.wav")

Next we can inspect the the loaded data using a plot (plotting just the first 500 values),

import matplotlib.pyplot as plt
# Create a "time" axis array; at present there are
# `rate` data-points per second, as rate corresponds to the
# data sample rate.
Nsamples = len(data)
time_axis = np.arange(Nsamples)/rate
plt.plot( time_axis[:500], data[:500])
plt.show()

We should see an audio trace that looks something like Audio trace

If you have headphones with you and play the audio, you’ll hear a noisy, but still recongnizable sound-bite.

Let’s see if we can improve the sound quality by performing some basic signal processing!

Cleaning the data using : scipy.signal

There are a host of available 1d filters available in Scipy. We’re going to construct a simple low-pass filter using the firwin function;

import scipy.signal as scsig

# Calculate the Nqyist rate
nyq_rate = rate / 2.
# Set a low cutoff frequency of the filter: 1KHz
cutoff_hz = 1000.0
# Length of the filter (number of coefficients, i.e. the filter order + 1)
numtaps = 29
# Use firwin to create a lowpass FIR filter
fir_coeff = scsig.firwin(numtaps, cutoff_hz/nyq_rate)

# Use lfilter to filter the signal with the FIR filter
filtered = scsig.lfilter(fir_coeff, 1.0, data)

# Write the filtered version into an audio file so we can see
# if we can hear a difference!
scwav.write("filtered.wav", rate, filtered)

While “manual” verification by listening to the filtered signal can be useful, we would probably want to also analyze both the original signal and the effect the filtering has had.

Spectral analysis : scipy.fftpack and scipy.signal

To analyze a 1d filter, we often generate a periodogram, which essentially gives us information about the frequency content of the signal. The perriodogram itself is a power-spectrum representation of the Fourier transform of the signal; however, this is not a detailed course in 1d signal processing, so if you have no idea what that means then don’t worry about it!

For sound signals, the periodogram relates directly to the pitches of the sounds in the signal.

Let’s generate and plot a periodogram using scipy:

plt.figure()
f, Pxx = scsig.periodogram(data.astype(float), rate)
plt.semilogy(f, Pxx)
plt.xlabel('frequency [Hz]')
plt.ylabel('PSD [V**2/Hz]')
plt.show()
plt.figure()
f, Pxx = scsig.periodogram(filtered.astype(float), rate)
plt.semilogy(f, Pxx)
plt.xlabel('frequency [Hz]')
plt.ylabel('PSD [V**2/Hz]')
plt.show()

Original frequency content Final frequency content

For those wanting more control over the spectral analysis, the fftpack submodule offers access to a range of Fast-Fourier transform related functions.

Performing the spectral analysis shows us that the signal did indeed contain a high frequency noise component that the filtering should, at least to some degree have helped remove.