python-intro

Introduction I

XKCD comic

Introduction

These notes are intended as a light introduction and guide to learning how to program in Python.

As big a part of this workshop as any of the formal aims listed below, is that you should enjoy yourself. Python is a fun language because it is relatively easy to read and tells you all about what you did wrong (or what module was broken) if an error occurs.

With that in mind, have fun, and happy learning!

Structure of this course

The main components of this workshop are these notes and accompanying exercises.

In addition you will receive a brief introductory talk, and we will work through the first exercise together to make sure that you are able to write and run a basic Python script.

From there, you’ll be left to work through the material at your own pace with valuable guidance and advice available from the workshop demonstrators - use them!

Where appropriate, key points will be emphasized via short interjections during the workshop.

Workshop Slides

The workshop slides can be accessed here

Introduction II

What is Python?

Python is the name of a programming language (created by Dutch programmer Guido Van Rossum as a hobby programming project!), as well as the program, known as an interpreter, that executes scripts (text files) written in that language.

Van Rossum named his new programming language after Monty Python’s Flying Circus (he was reading the published scripts from “Monty Python’s Flying Circus” at the time of developing Python!).

It is common to use Monty Python references in example code. For example, the dummy (aka metasyntactic) variables often used in Python literature are spam and eggs, instead of the traditional foo and bar. As well as this, the official Python documentation often contains various obscure Monty Python references.

Jargon

The program is known as an interpreter because it interprets human readable code into computer readable code and executes it. This is in contrast to compiled programming languages like C, C++, and Java which split this process up into a compile step (conversion of human-readable code into computer code) and a separate execution step, which is what happens when you press on a typical program executable, or run a Java class file using the Java Virtual Machine.

Because of it’s focus on code readability and highly expressive syntax, meaning that programmers can write less code than would be required with languages like C or Java, Python has grown hugely in popularity and is now one of the most popular programming languages in use.

Added Bonus!

Due to it’s popularity, Python is available for all major computing platforms, including but not limited to:

  • Windows
  • MacOS - includes a version installed by default
  • Linux - includes a version installed by default in many distributions
  • Android - via several android apps e.g. QPython, Kivy, Pygame, SL4A.
  • Plus Solaris, Windows CE, RISC OS, IOS (IPhone - via apps) and more

Why Python? : Motivation

Now that we know roughly what Python is, why is Python of interest to us as researchers?

For users of specialist environments like Matlab, Stata, R, the answer might be because in most cases Python offers similar performance and range of functions, while providing a much wider range of additional functionality. Plus compared with Matlab or Stata, Python is open-source and free.

Python venn diagram

If you come from a low(or lower)-level computing background like C++, Java, Fortran, then Python is great at accelerating development and prototyping time. The ability to “glue” together routines written in Fortan or C++ at the programming level means Python offers the best of both worlds.

Lastly, if you’re not from either of these backgrounds, then let’s provide a sample of what you can do with Python for a typical research project:

… and why NOT Python?

As much as I think Python is a fantastic programming framework for many tasks, it’s important to pay attention to it’s limitations and possible scenarios when we might not want to use Python.

* An important note here though is that Python has several mechanism that allow integrating with compiled libraries; in fact most of the numerical computing functionality comes from compiled C-code! Matlab has similar capabilities via “MEX” functions (though to my knowledge the interface is a little more cumbersome). R also has similar interface functionality

Example

With just 13 lines of Python (plus comments), we are able to write a realistic script to loop over all CSV files in a folder (and subfolders), and generate a statistical plot for each one, including titles etc!

Sample result

(“Time-series” generated using numpys random number generator).

In addition, this was using general numerical libraries; with a specialist library like Pandas this could probably have been reduced further.

While these modules won’t be covered until the advanced sessions, these introductory sessions lay the groundwork for being able to use these modules.

“Real world” Example - Attendee-contributed

Given a task like:

What I would like to do is read an xls file and see if any items in one column are also in a particular column of another xls file.

The real world issue is we get a daily data dump of FRUIT which have GONE BAD and I want to cross reference this against my FRUIT inventory. I can easily turn both into csv files of course. I started to write a script in Python, but have never found the extra 30mins or so I need to finish it.

How can we achieve this, and can we do so with just what we learn in this workshop?

The task turns out to have a simple solution, as well as some more concise approaches if we can use more advanced Python and/or modules.

#
# Simple Python Version
#
print("\n\nSimple Python Version")
# Open data files for reading
fin1 = open("data_sheet1.csv")
fin2 = open("data_sheet2.csv")
# Create empty lists to store contents and overlap
col1 = []
col2 = []
overlap = []
# Read in the files, discarding spaces and removing the comma
for line in fin1:
    col1.append(line.strip().strip(","))
for line in fin2:
    col2.append(line.strip().strip(","))
# Add an item in col2 to the overlap if it is in col1
for cell in col2:
    if cell in col1:
        overlap.append(cell)
# Show what the overlap items are
for cell in overlap:
    print(cell)
# Close the files
fin1.close()
fin2.close()

#
# More advanced Python Version A - order not preserved
#

print('\n\nMore "pythonic" python version -version 1 - order not preserved')
col1 = set(line.split(",")[0] for line in open("data_sheet1.csv"))
col2 = set(line.split(",")[0] for line in open("data_sheet2.csv"))
overlap_2 = col2.intersection(col1)
print("\n".join(overlap_2))

#
# More advanced Python Version B - order preserved
#

print('\n\nMore "pythonic" python version -version 2 - order preserved')
col1 = [line.split(",")[0] for line in open("data_sheet1.csv")]
col2 = [line.split(",")[0] for line in open("data_sheet2.csv")]
overlap_2 = [ cell for cell in col1 if cell in col2 ]
print("\n".join(overlap_2))

#
# Using modules version
#
print("\n\nModules version (using pandas)")
import pandas as pd
df1 = pd.read_csv("data_sheet1.csv", header=-1)
df2 = pd.read_csv("data_sheet2.csv", header=-1)
overlap_3 = pd.merge(df1, df2, how="inner", on=[0])[0]
print(overlap_3)

Given input data sheet 1

Apple,
Banana, 
Mango,
Raspberry,
Blueberry,
Passionfruit,
Cherry,
Pear,

and sheet 2:

Mango, 
Red Herring,
Cherry,

The whole script then produces the output

Simple Python Version
Mango
Cherry


More "pythonic" python version -version 1 - order not preserved
Cherry
Mango


More "pythonic" python version -version 2 - order preserved
Mango
Cherry


Modules version (using pandas)
0     Mango
1    Cherry

If you would like to run this example, you may download the data sheets from here:

data_sheet1.csv data_sheet2.csv

Aims

This course aims to teach you how to use basic Python including

We will not be delivering hours of lectures on programming constructs and theory, or detailing how every function of every module works.

Instead the aim of this workshop is to provide an environment for **you** to learn to program, with help at hand when you need it, and some introductory exercises and notes to help you get started.

Printing the notes

For both environmental reasons and to ensure that you have the most up-to-date version, we recommend that you work from the online version of these notes instead of print-outs. However, while there are no plans to ever take these notes offline, you may wish to save them to PDF (via the print to PDF functionality) to safeguard agaist such an eventuality.

A printable, single page version of these notes is available here.

Errata

Please email any typos, mistakes, broken links or other suggestions to j.metz@exeter.ac.uk.

Installing on your own machine

If you want to use Python on your own computer I would recommend using one of the following “distributions” of Python, rather than just the basic Python interpreter.

Amongst other things, these distributions take the pain out of getting started because they include all of the modules you’re likely to need to get started as well as links to pre-configured consoles that make running Python a breeze.

Note : Be sure to download the Python 3, not 2, and get the correct architecture for your machine (i.e. 32 or 64 bit).

Getting started

Before we dive into Python, let’s get familiar with the environment we are going to use to program and run Python. The two main components you will need to use Python are

*nix Users

If you are continuing on from the UNIX/Linux course and would like to continue to use that, or are using your own Linux machine or MacOS, you should already be familiar with your terminal program and editor.

For the remainder of these notes we will, where needed, show how to use win and *nix (*nix being a common term for “unix-like”).

An editor

As Python code is human readable text, we need a text editor of some sort to read and edit Python code.

Jargon

If the text editor has features like syntax highlighting (colour coding words in the code based on whether ther refer to functions, known keywords, etc), code completion, and other goodies, it’s called a code editor. If it is embedded in an interface that also has a terminal and sometimes a variable browser, the whole program is referred to as an Integrated Development Environment (IDE). Spyder and PyCharm are two such IDEs specific to Python.

For the windows users amongst you, we will be using Notepad++ as this is similar to Notepad but adds things like syntax highlighting.

In the Start Menu, find Notepad++, either by looking through the programs or by using the search field.

*nix users

If you are already comfortable with your editor of choice, keep using that. For the rest of you pluma is the standard text editor included with the MATE desktop environment (gedit on other systems).

The terminal

In order to run Python scripts we will use a pre-configured command prompt provided by WinPython.

In the Start Menu, find WinPython Command Prompt, either by looking through the programs or by using the search field.

To run a script from the terminal use

python scriptname.py

*nix users

Luckliy for you, Linux (and to some extent MacOS) systems make development much more straight-forward!

The ubuntu systems on openstack have had Python installed on them, and all terminals get preconfigured by the installation processes.

On MacOS, you will need to use the Anaconda terminal (or have correctly configured your standard terminal to use Anaconda instead of the built-in Python).

Getting help

As well as asking the demonstrators, you are encouraged to get used to using online resources. Simply searching for e.g.

python FUNCTIONNAME

(and replacing FUNCTIONNAME with the name of the function you want help on!) using your favourite search engine will almost always return relevant help.

While the demonstrators are there to help you get started and provide detailed help when you need it, it will be very beneficial to you in the long run to become familiar with what online sources there are and how to optimize your searches to most quickly find the answers you need.

Resources I often use are:

Advanced Users

Another simple way of getting help is to use the interactive help system in the IPython console. The IPython console is an interactive Python session, i.e. it looks like a terminal but instead of accepting terminal commands, it accepts Python code directly. The IPython console has several useful features to get help including

  • help(FUNCTIONNAME) prints help on the function called FUNCTIONNAME
  • FUNCTIONNAME? prints help on the function called FUNCTIONNAME
  • MODULENAME. and then pressing tab (twice) shows a list of all functions available in the module called MODULENAME (if it’s imported).

Writing your first Script

Organization of scripts

Before we write anything, let’s create a folder to hold your Python scripts.

Usually you would choose a hierarchy that’s sensible for you (for example I use Documents/programming/python in my home directory as the root for all of my Python projects!).

For the purposes of this workshop, let’s use your Desktop folder in your U drive and create a folder called

python_workshop 

*nix users

Similar to above, but place the python_workshop folder in your home folder (e.g. /home/ubuntu for openstack users).

NB

It’s a slightly confusing convention, but a user’s home folder is the path /home/username, not simply /home.

What is a Python Script?

A Python script is just a plain text file, and the convention is to use the extension .py (instead of e.g. .txt) to let programs know that it holds Python code.

Python code is very close to something called pseudo-code, which is what people use when detailing the main components of an algorithm.

For example, the pseudo-code for the factorial function (e.g. 3! = 3 x 2 x 1) is

SET fact to n
WHILE n is more than 1
    SET fact to fact times (n - 1)
    SET n to n - 1

while the python code is

fact = n
while n > 1:
    fact = fact * (n-1)
    n    = n - 1

What this simple example illustrates, is that Python is extremely readable; it just takes becoming familiar with a few base syntax rules (~grammar).

We’ll be speaking Python in no time!

Worked Exercise : Hello, world!

We’ll start by creating a blank Python script file.

Creating a file

We’re going to name our first script file exercise_hello_world.py and keep it inside the newly created python_workshop folder.

To do this, open Notepad++. You should see a blank file (that may be named “new 1”, or “new 2” etc, depending on if you closed any tabs!).

Starting Notepad++

If you don’t see a blank file, select File->New from the menu bar.

Then select File->Save As, navigate to the python_workshop folder we created a few minutes ago, and set the file name to exercise_hello_world.py and click Save.

Now that we have a blank Python script file, lets start adding some code!

Initial content

First of all, enter:

    # Author: Your Name <your@email.address>
    # This is a script to test that Python is working

replacing the text in the line starting # Author with your details.

Running the script with Python: The Terminal

Now let’s see what running this through Python does!

Start a customized command prompt (reminder: in the Windows File Explorer, find the WinPython3 folder on the C: drive, and click on WinPython Command Prompt.exe).

A terminal window should pop up, that looks a little bit like

Terminal Window

Reminder: Basic terminal usage

You were advised to have basic knowledge of using a terminal (Windows Command Prompt/Linux Terminal/MacOS Terminal), you are about to see out why!

Here’s a recap of the things you’re most likely to need.

Windows MacOS / Linux What it does
`cd FOLDER_NAME` `cd FOLDER_NAME` Change directory to FOLDER_NAME
`dir FOLDER_NAME` `ls FOLDER_NAME` List folder contents; if FOLDER_NAME
is omitted, list current folder contents
`..` `..` Reference to parent folder. E.g. `cd ..`
is how you would navigate from `/a/b/c/` to
`/a/b/` if you are currently in `/a/b/c/`.
`mkdir FOLDER_NAME` `mkdir FOLDER_NAME` Create a folder called FOLDER_NAME

Quick note on terminology

Folder and directory refer to the same thing, while full path or absolute path means the full directory location. E.g. if you’re currently in your Desktop folder, the folder is Desktop, but the full path is something like /users/joe/Desktop. If you’re on Windows the path starts with a drive letter too, like “C:” or “U:”, and the forward-slashes will be backslashes instead.

Console and terminal (and sometimes shell) are usually used interchangeably to mean the same thing; the text-based interface where commands can be entered. In windows, the built-in console is also called the “command prompt” and is started using cmd.exe.

For our purposes, we’re going to be mainly interested in the terminal console which is where we type commands like cd, or dir.

For interactive Python snippet testing we can also use the Interactive Python console, which is where we can directly type python commands. You might encounter this later; for now just be aware that there are these two types of console.

Now using the terminal command to change directory, cd, navigate to your Desktop directory.

Navigate to Desktop

You can verify that it contains your new python_workshop folder by using the windows terminal command dir:

dir

should list

python_workshop

in the output.

Change directory into the python_workshop folder using

cd python_workshop

and verify that our new file is there using dir.

If you see your file (exercise_hello_world.py) listed, great! If not, check the previous steps carefully and/or ask a demonstrator for help.

Once the terminal is in the correct directory, we’re ready to run Python on our file.

As the terminal is preconfigured (meaning that it knows all about the Python program and where to find it) we can simply type python ... to run the Python interpreter, replacing “…” with input arguments.

In most simple use cases, we just use a single input argument; the script file name.

In advanced usage cases, we can also add in additional command line arguments to the script, but this will be covered in an advanced exercise in the follow-on workshop.

We can now type

python exercise_hello_world.py

to get Python to run our script file:

Show Desktop contents

We should get no output - python has interpreted and run our script file, but as the script only contained comments, no terminal output was produced!

Comments

Comments are used to make notes about things like what each few lines of code are doing. In our case, we also added an initial comment that keeps track of who wrote the script. Comments are created by using the hash symbol, #.

A comment can take up a whole line as in our script above, or only part of a line; we’ll see an example of this later.

Adding functionality

Now that we have a script file that contains a couple of lines of comment, and successfully runs with Python (i.e. does nothing!), let’s add some functionality.

Switch back to the editor window (Notepad++) and add an empty line (for readability). Then, on the fourth line of the script add the text

print("Hello world from YOURNAME")

replacing the placeholder YOURNAME with your actual name.

Switch to the terminal window, and repeat the python command

Tip

On many terminals, you can press the Up arrow key to cycle through previous commands. This will save you from having to type the command each time!

Tip

On several desktop environments (including Windows), you can cycle between open windows using “Alt + Tab” (or “Alt + Shift + Tab) to cycle in the other direction); this saves you from having to use the mouse between editing and running code.

Hello, world!

Hurrah! We got Python to output text to the terminal. This may not seem like much of an achievement, but once you understand this line of code, you’re well on your way to being able to program in Python.

So let’s have a look.

Anatomy of our script

Lines 1 & 2

As mentioned above, lines 1 and 2 are comments, which are non-executing lines of text that are used for us to be able to understand our code. They may seem pointless now, but if you give your script to a colleague who’s never touched a program before, if they read the first couple of lines they will immediately know who wrote the script, and why.

Comments become much more useful as scripts grow; “future you” may well benefit from well commented code as you look back over a script and try to remember what you were doing and why!

Line 4

Our first line of Python code contains two of the major concepts of this course; a function call, and data type.

Calling a function

The function being called, or executed, is named print, and the data it is given as an argument is "Hello, world from Joe". This data is of type string (more on ths in the next setion!).

What is a Function?

A function is a self-contained piece of processing; often functions take inputs and provide return values (but they don’t have to).

They provide a way to separate specific pieces of processing so that they can be reused over and over again.

If you’re familiar with the concept of a function from mathematics, programming functions can be similar: for example the sin trigonometric function generates an output number (between -1 and 1) for any input angle.

The print function does not generate any output values - it only causes its input to be “printed” to the terminal. For functions that do generate output values, these outputs are often captured by assigning them to variables - more on this later!

The syntax for calling a function is:

Outputting to the terminal using print

The print function is useful for providing output to the terminal - which is the most basic way of getting information out of a Python script.

The print function accepts a variety of input data types. For example we can write

print("Any string") 

as well as

print(3.147)

i.e. a number.

You may also pass multiple, comma-separated arguments to the print function. E.g.

print(10, "is bigger than", 2)

outputs:

10 is bigger than 2

Now that we know how to write a script, and how to run it with Python, let’s examine in more detail what goes into the script, starting with data types.

VITAL note on whitespace in Python scripts

Guido (the creator of Python) decided that code-readability is crucial for good programming, and that unlike most other languages where badly laid out code is still valid, in Python code must be laid out in a specific way.

By layout, we are refering to the whitespace (spaces or tabs) preceding text in code, known as the indentation:

**Every "logical block"\* of code in Python must be at the same indentation level**.

*We’ll cover in more detail what we mean by “logical blocks” later on, when we look at loops and conditional execution of code.

For example

print("Hello")
print("World")

is perfectly fine, while

print("Hello")
  print("World")

would cause an indentation error.

While this feature of Python may seem petty or just irritating at first, many Python users grow to appreciate its significance in enforcing good coding practice.

Data types

At the end of of the last section, we introduced the string data-type as being an argument to the print function.

Two of the most basic data types in Python are strings and numbers.

Numerical data

Valid numbers like 10, 0.001, and 1E6 (a million in scientific notation) are all treated in the same way by Python.

Numbers can be operated on using standard arithmetic like

Jargon

Unlike “statically typed” languages like C++ and Java, number data storage in Python is handled automatically and conversions done as needed.

For example 10 will be stored internally as an int, but 1/10 will result in 0.1 (float) while 10+1 will result in 11 (int).

Exercise : Using Python as a calculator

Write a script (name the file exercise_calculator.py) to output the result of the following operations:

Additional operations: comparison operators

In addition to standard algorithmic operators in the previous section, you can perform comparisons on numerical data resulting in Boolean (True/False) results, such as

as well as >= (greater than or equal), <= (less than or equal), and != (not equal).

Booleans: True & False

Here we introduced a new data type - the boolean (aka bool).

Boolean data is converted to 0 and 1 when performing any kind of arithmetic, e.g.

  • True + False gives 1
  • True/10 gives 0.1

Booleans can be thought of as being a sub-type of numerical data - where only 0 and 1 are represented.

Strings

The term string is roughly speaking short for a string of characters, i.e. text. String data is enclosed in single or double quotes; the following are all valid Python strings

'I am a string'

"I'm a string too" (A double-quoted string can contain single quotes and vice-versa)

"""
And python accepts multi-line strings enclosed in 
triple quotes...(more on me in a while!) 
"""

Strings are one of the most basic “sequence” data types; we’ll encounter a few more in the next section.

Accessing individual characters in the String

To access individual characters in a string, we use index notation, which is represented using square brackets, [ ].

For example, to access the second character of a string we can use

"abcdefg"[1]

which gives access to the character “b”. This is because Python uses zero-indexing meaning that the first element is accessed using [0], as do most programming languages (a notable exception being Matlab, which uses 1-indexing, i.e. the first element is 1, not 0).

Accessing a range of characters

If instead of accessing a single character we want to access a range of characters, for example the first five characters in the string "Hello, world", we use what is called slice indexing:

"Hello, world"[0:5]

returns "Hello".

The syntax for slice indexing is [START_INDEX : END_INDEX_PLUS_ONE], e.g. if instead we had wanted the fifth to the eighth characters (inclusive) we would use

"Hello, world"[4:8]  

which returns "o, w"

By default, the START_INDEX is 0, and the END_INDEX_PLUS_ONE is the length of the string, so we could have written

"Hello, world"[0:5]

as

"Hello, world"[:5]

Both return "Hello".

Negative indexing

Lastly, a really handy indexing feature is negative indexing; the last character of a string is accessible using -1, the second last as -2, and so on. Negative indices can also be used as part of a slice, e.g. to access the last 5 characters we can use

"Hello, world"[-5:]

which returns "world".

String operations

There are a range of operations that can be performed with Strings.

These include some translations of the arithmetic operations:

as well as functions known as member functions which can be accessed using dot-notation, e.g.

The full list of member functions is:

capitalize    endswith      index         isidentifier  istitle       lstrip        rindex        split         title
casefold      expandtabs    isalnum       islower       isupper       maketrans     rjust         splitlines    translate
center        find          isalpha       isnumeric     join          partition     rpartition    startswith    upper
count         format        isdecimal     isprintable   ljust         replace       rsplit        strip         zfill
encode        format_map    isdigit       isspace       lower         rfind         rstrip        swapcase

More details on all of these methods can be found here.

Exercise : Using Python to analyse text

Write a script (name the file exercise_strings.py) to count the number of occurrences of the character “A”,
and also the number of occurrences of the sequence “AT” in the following string of text (tip: carefully double-click on the string to select the whole line, copy, and paste the string directly into the file).

'CGCCAATGCGGCAAGGATATGCGAAGTCTGGACTAATTCGGCTGACGTGTCCCTGCTTAGTGGTCTTCCACACTTGCGGATTCAGCCGTAAGTGGCGTATACCTCGTGAGTGCACAAGGCAGATGTGACCTACCGGGGTTTTATCATTAGACTTTTGGGGTGAGCCGGATGACCGATCGAAGCCCGAGTGCAATTGTCTCTCTCGAACGAAGAACGGAGGAGAAAACGTGTGTGGGGGCCTACCGCCATGCACAAACTAGACTGTCACTAAAACCGTGAAGCTACGCTGGCCTCCAGGCGGTATAAACCTTTCGATGTTAACAAGCAAAGAACCAATTCGCGTGAGTAGGCGGGCGTATGGCCCCACGAGCCTTGCACTTGTTTTCGAAATGAATCAGGACGCCTAATTATCAGAGGGAGGAGAAATGAGGCCAGCCAGCGACACTGGTCAAGGTACGGGCGGTCGCTAGTGCCCAACCAAAGGTAAGTTATTGCGATGGTCCAAAAGAAGGCACGTGTGGATACACTCGTTTATGAACGTTTCTACGGCAGATCAGGCCGACCTTCGATAATAACAAGCGGCGGGACGCACGACGGGACTCGCTGTCGGTCAGCTATGGCCATTCCTCGTAGGAGCCGCATCTATCTCGAACTAATTGATAGTTTGGTGTAAGTCCCCTCAGGTGTCACGCAACGAAGATGCGCTGAAGATTACTTTCGCACGGGTCACACGGAAGGAGTACTGTAGGGCGGAAGAGCACCGACTGAGGCCACAATCTCGAAGTACTGTGCTTTCGCTCTAACTCGGCTTACCCGTCTACCTGTCGCCTCCCTAGATCCAAATTGAATCCGCCCCCCGTGCTCTGTGACCCAGGACGTATACGGCGTTTAGGTTGTCCACAGCTAAAAACCAGAAAGCGACCGAGTGTATTCGAAATTTCGGTGGACCTTTCAACCTATAGGTCTTGTCGAATTCACTTGGGAGAACAACGCATGAAATTTGACGGATCGTGCACGTGATATAATGGGACTGCTTAATTGCGCCCCATTTTGGGAGCGCATTTGAACGCAAGCTCTGGGTCCCGCTATATATTAAGAAAAGTATGAAACGTTGTTACCATATCCGCACACTGGGATAGGTACGCAGATTTGTACTTGTATGCGTAACTGATTTTTCCCCTGACGGAGGGTCCGTTCCTCTGAGCCCCCGTCGTGCGATCCTGGGTGGCCACGTCTAAGCTGTCGCGAGCGAACATTATTTATGTTTATCTGCCAGACGAGCTTTGCCTACTTTCGAGGGGATGAAATTTAATTAAGCGATTTGAATATAAGGGGGTTTCATATGCCTAGATTACCTAGTGCGTTTATACAACTATGGTGAATAGAGGAGCAGTCCGAGTTAGAGGACAAACACTTTCGCAGGTGGCAAGTCGCACTAGCGAGTTGATTACGGACCACGAGGTATATTCAGGACATCAATTTTCCTGGGGGGATCATCTCCTCTTACTGTAGCAGCTTTTTTCTCTCCCTGCGGATTCAAAGCCCTTGTTCTGTCGCTGCCATTTAAAGGGAAAGGACTCGGAAGAACAGGTTCAGAGATTGGCAAAGACGGTCTTCTGTGCACTTTGATCATTGTGGCTTGAGGCGGGAGACACGAACGGCGCTAGCGACTCTCATCTACCAGCCTATTATATCCGCTCCCCTGGTTGAGTAAATACCTAATAAGGACTTTTGTCAGATTGACTTTCTGCAAGGGCAGGGATGGCATAGGAGATATTCACTAATAGGATGAACGTCGAAGGAGTAAATTGTTTGGAGTAATATTTTAATTCTCCTCCGCATAAAAACGTGCCTGACTAATGCTGACTGGAAATGACGTCATGGGGTGACATCCTGACAAGTATTCGACAGACGCAGAATGGCGACGGCGCACTCAGATTTAGTCCTCTTCTTCCGAGTAAATACTCGTACACCGCAAAGATTGAGGGCATAGGTAAGCGTACAAAATCCGGTGTCATCGACCCAAGTAGAGACTACATGACGGGCCGTGAGGTGATCTGATCTTTGACTCTCCGTAAGGTGTCCCTAGGGGGTTCCCATGGTAACGGATTTGCGCTCAACCCGAAACTCGAACAACATCGAAATGAGTATAACGGTTAGAGGTTAGTGGGGGGTGCGAGTGCGGTGTTCCTACTGTACCCGAAGGATAGTCCTGTTTCATTCATATTGGAGATTACAGCCCCTAGAAGTGAGGGAACACGCCCGAGGCTTTCATGGCTACAGGTCGGGATGTCAGCCCCCTCTAAGGTTGGAAGCAATAGATCACCTATGTTAGATGGCAGCTGATTTCCACCTCCTGCCGAAGGTCCCATTATAGGCATCCCAAGGTGCAGTCGATACCCCAATTGTTCGCCTAGTGGTGGAGTGGCCATCTGTGGGGCATGTCATGAAGAACAGGCCACCTCGGCGACCCAACCTCCACTCAGTCGGTCCGCTGAAGTCTCGGAGCTCTAGTTGACGGAAGGCTTCGGGTTTCTCACCACCTGTCCGTAAGAGACCTGTATTGGTCGCACGCAGGAGGAAGACGGCTTACGATGTGTGGCTAATTCGCGTCCTCATGCCCAGCCATACTATGTTGTGACGCGATGACCTCAGCGGTTAATGCCTCTCCGCCAGTTGGATAGTTCGTTCTGGAAACCTGCAATACATCCTTTCGTGCTTGGCGTCTGATAAGAGTAAGGAACTTATTGAACGTTTACCCATAGCGGGCACTTCAAGTCTGGGCCCGAAGGGAACTCGTGATAGGGGGCGCAATGATATTCTGCTGTCTAAAAGCCACGACAAGGTCTCCACAAGTCAGGACGCCAATCCAACTAAATACTGCCGAAATGCGAGAATTCGTGCCCCCACGCACGTTCTAGGCGAGCGTTGGCGTCAGAAATACGTAAGACTGGTGGACTTTGAACAGGCAACGGGCAGCGACTATCGATAAAGTAAATCCCGCGATAGAAGTTACATCTCTTAGCCTCAGAGACTCATACCGGGCGTATCCGGTACGTCATCGCCATGGACCATTCCGGTAAGTCCATATCATATCGAACAGCCTTTACTACTGGAAACCCATCTTCCAGTACATGTCCGGAAATGGGACAATAGAAAACTGCGGTGCGTGAGCCTACTATAGTGTATCCCGGTATAGATTGGTGCTCAGGCAAAAGAGCTCTACGAGACAACGTCGACAGAGACAGGCGATCGTACGAGCGAGTAGGCATCACCTGCGGTGTTTGGACTATGTGAGGAGCATCAGGTCGTCTCTAAAGTATCGACTCTTCGTATTAGGCATCCACTCAAAATGAACCTTGCCCACGTCTCCTCCATCTCAGAGGATATGTCACGTCTGCCTACCTGAATGCCGACTGATTCGTCTACAACCACTAATACGGACGTAGTCTCCTCAAGAGTTACAGGTTAGATCCTTACCCATAATATCGGACAATCGTATCGGGTGGTGGTTAAGCGTCGGCGAGCTGTGGTTCAGTGCGATAGGGTTAACCCGCGTGTTCAACGCCCGGGCACAAGAAGTGAACTAGGCGTCTCGGTCCCGGAGGGTTGGATCCATTTACCATCGAGTACGAATTATGACTCCCTAAGTAATACCAAAAGGCCTAACCGGGCCAGGGCCCGTATCGCACCGACGCTCTGGGGTCCGCCTAGAGGTTGACCGCACGACAGGCCTCCTCCTATAGGCGGTTCCGCGTCGGACTACTATCGTCTGGTGTAAGACACTAAGCTCGAATCGACCACACGTAGATTATTTACGATCATGGTCGCTAGGGACCAGCTGTACAAGCTCGTAAACTTAACCTAGTCAGTATTTTGGACCTTTCAGGTGTACGCCGGAATTGAATTGTGGGCTTCAGCGAGCGATGTCCTTATTTAGCAATTCACGCACGGCGTACTCATATCGCTATAAGCGTGTCCGACCTAAGTGCGTTGGGCACTCCGTTCCTGAAAATGTTTTTCGCTGAATCTGGTGTAACCTGCGCGGCGGCATCTTATGAACATTAACCCGCGTCCAGGACGTAAGGATTCCGCACCCTAAGGAAACCGGGTCCGCTTATCAGTATCAGCTCATTGGAGGTTGAAACATTGCTTCCATCATGTCAAATGGTGCGGGAGCGTAGGCTCGTTCAAGGATCAAAGCCGCATGGTCGCCTGCTCTCTAGTTTCAAACTGTTAATAGGAAAACCGTGTACTATTAGAGGGTGGAATCCAAAGCCTTGTAGGGCATATAAGAGGGAAATTCTTTTTCCGGTGCTTAACCCAATGACTCCCTCCGGATAGCCTCACTAAATTCTGGCGATACAACTACTCGTTCGGGATTCTATTGCCTTCCGGATGGTTCCCTGTGCCTATAAGTTCGTTAACGGTGTACCTCGAACAGAATAAAAGTCCACCATGGAAATGGGATTCTCGGAGTGCTCCAGAATGATCTGTTAGCAGCTACGCCGCTGGTACTTCGTAATCCATTAAAGCGGTTTAGACTGCCAACTCCTCCGTGCGCAACAGATAGCCTCAACAATTTACGCCATCTGAGCGGACAGCATTTGATAAGGAATGTACATCACCGGGACTCCTTTTGTGGGAGTGCGGCACGGACGCGTTATGCCGAGTTCTCTAGCTACCCTGGCTAGAGAACCTAGGAGTGCACGTTCGTTTTGAACCCTAAACGTCCGATGCGACCCTTGAGTCGCAAACTGTGTAACATGCCGGCGGTGGGTAAAGTTATCTCTGGGATAGGTCTGAGCTCGCGAAAAAAGTCGCATCCGGGCATGGCTTGCCCAACTGTGGACCATTGCACAATAGCGAAACAGGCATGCGTTAAGTCACACCACAGACCTTGGAATTAGGGCGATGGCGTACCACACCTTATCGTGGAGCCCACCAAGAGAGCAAAAGTCATTAACGATCAATTTTGTAACAGATCTAATTGGATGGAG'

Variables & the assignment operator, `=`

The last exercise in particular would have been much cleaner if we had a way of referring to that particular string instead of having to write it all out several times! This is one of the basic use-cases of **variables**! A variable is a way of keeping a handle on data. Variables can hold numerical or string data that we've encountered so far, as well any other type of data, as we'll see later. In order to create a variable in python, we use the **assignment operator**, `=` i.e. the equals sign. For example ``` a_number_variable = 10 text1 = "aaaaa" ``` ### Naming variables You are **free to choose any name for a variable that you wish**. The only exceptions are that the variable name cannot contain spaces or other special characters, and cannot correspond to a special python **keyword** like `if`, `else`, or `for`, as these are reserved for special operations. While not being *illegal* (*illegal* in programming means that it will give an error), you are also strongly advised to not over-write built-in function names. For example it is technically *legal* to name a variable `print`! However, you would then overwrite the print function and no longer be able to print things to the terminal! Python variables are **case-sensitive**, so a variable called `a` cannot be referred to as `A`, and a variable called `MyNumber` is not the same as `mynumber`! ### Note on variables vs the data they hold New programmers are sometimes confused by variables vs the data they contain, especially when it comes to string variables. For example, the following are all valid variable assignments * `one = "1"` - a variable called `one` that holds the single-character string "1" * `one = 1` - a variable called `one` that holds the number 1 * `OnE = "one"` - a variable called `OnE` that holds the string "one" ### Using variables Once a variable has been assigned, we can manipulate its data in exactly the same way as if we were dealing with the data (number, string, etc) directly. For example ``` print("Joe Bloggs"[-6:]) ``` and ``` somename = "Joe Bloggs" print(somename[-6:]) ``` would both output `Bloggs`. What happened here? 1. In the first line we assigned the string `"Joe Bloggs"` to the variable `somename`. 2. Then in the second line, we access the last 6 characters of the string using the **slicing** that we learned about above, and print it to the terminal. ### Exercise : Basic variable usage Write a script (name the file `exercise_variables.py`) and create a variable (give it any name you like!) that contains the string ``` "The quick brown fox jumps over the lazy dog" ``` Then create a second variable that contains the text ``` "lazy cat" ``` Now use the `replace` member-function to replace "lazy dog" with the contents of the second variable and assign the result into a third variable. Remember that a member-function is called using the ``` .(ARGUMENTS...) ``` syntax. Lastly print out all three variables. {%include modal.html url="/exercises/exercise_variables.md" buttontext="Exercise" title="Variables exercise" %} {%include modal.html url="/exercises/answer_variables.md" buttontext="Answer" title="Variables exercise answer" %} ### More assignment operators Along with the standard assignment operator, `=`, Python has additional extensions that provide shorthand ways to assign values into a variable. For example (rhs = right hand side) * `+=` : add the rhs to the variable; `a += 10` is the same as `a = a + 10` * `*=` : multiply rhs by the variable; `a *= 2` is the same as `a = a * 2` * `/=` : divide variable by rhs; `a /= 4` is the same as `a = a/4` Where appropriate, this also applies to string data, e.g. ``` a = "Some " a += "text" print(a) ``` would output `Some text`.

Container data-types

In addition to the basic data types covered in the last section, Python has several built-in “containers”.

These come in two main types;

  • Lists and Tuples : store a sequence of values
  • Dictionaries : map keys to values

Lists

A list, as the name implies, is a sequence of items. For example, the numbers 1 – 10 could be arranged in a list: 1,2,3,4,5,6,7,8,9, and 10.

In Python, lists are created using square brackets, and items separated by commas, e.g.:

[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

List elements can be anything (including lists - which creates “nested lists”), strings, etc.

For now though, lets get to grips with lists by considering flat lists of simple data.

Manipulating lists

Lists can be manipulated using member-functions:

  • grown using
    • insert (anywhere in the list)
    • append (always at the end),
    • extend to add multiple items at the end
  • shortened, using
    • pop (a specific index - without any number the last element will be removed)
    • remove (a specific value)
  • sorted using sort
  • searched using index

Lists can also be combined using the plus operator, +, e.g.

[1,2,3] + [4,5] 

results in the list [1,2,3,4,5]

The length of a list can be determined using the built-in function len.

Note

len is not a member function of a list, i.e. you cannot call AListVariable.len()

Instead, use

len(AListVariable) 

to return the length of a list

Anything that can be “iterated” over or cycled through, can be converted into a list using the list function.

Python also contains a built-in function called range which is a convenient way of generating a range of numbers (as an iterable that can be converted into a list):

list(range(10)) 

is the same as

[0,1,2,3,4,5,6,7,8,9]

Exercise : List manipulation

Write a script (name the file exercise_lists.py) to perform the following operations (this is just to test them all out!):

  1. Create a variable that contains a list that holds the numbers 10 to 100 (i.e. 10, 11, 12, …, 99)
  2. Add the value 100 to the end of the list
  3. Remove the 20th element of the list (index 19!)
  4. Remove the value 55
  5. Add the elements 5,6,7 to the end of the list (hint use extend with [5,6,7]!)
  6. Print the length of the list.

Tuples

A tuple is similar to a list in that it is a sequence of items.

The main difference between a list and a tuple, is that a tuple is immutable which means that elements cannot be changed, added, or removed once created (i.e. it is ~”read-only”!).

This makes tuples computationally faster than lists, but also less versatile.

As a general rule of thumb, when in doubt, use a list.

Dictionaries

Lists are great when we have a sequence of data, but the caveat is that in order to get a specific element we need to keep track of its index or repeatedly call the member-function index.

This can get tricky if the list grows and shrinks.

Motivation: why dictionaries?

For example, if we want to keep track of the colours (and names) of fruit, our first attempt might be to keep two lists - one for the names, and one for the colours:

names   = [ "banana", "orange", "strawberry"]
colours = [ "yellow", "orange", "red"]

Then, to find the colour of a strawberry, we first have to find the index of strawberry, and then use that to find the colour:

ind = names.index("strawberry")
print("The colour is : " + colours[ind])

Not only is this clumsy, it’s also likely to lead to issues as the lists might accidentally become unsynchronized (meaning that the order of the name and the order of the colours doesn’t match).

Attempt number two: as briefly mentioned, lists can hold pretty much anything, including other lists, so we could use a list of lists, e.g.

fruit_colours = [ ["banana", "yellow"], ["orange", "orange"], ["strawberry", "red"]]

Now the name will always be paired with the right colour!

The downside to this approach, is that if we want to find out what colour a fruit is, we need to do a non-trivial search operation (as we’re trying to find lists in lists!) and can no longer use the index function that we used before.

The bottom line is that we could hack together a solution using lists, but luckily for these kinds of situations, Python offers us a much better solution in the form of a dictionary.

Dictionary creation

A dictionary is created using curly-bracket notation { }, for example continuing our fruit names and colours example, dictionary items are provided as a list of key:value pairs:

fruit_colours = { "banana" : "yellow", "orange" : "orange", "strawberry" : "red"}

Now we can access values using keys:

print(fruit_colours["banana"])

will print yellow to the terminal.

Note about keys

Here we’ve used strings as both the keys and values, but you can also use numbers as keys and/or values.

In fact pretty much anything can be a value (much as with lists), and anything that is hashable can be a key – hashable roughly translates as non-changing. A number or string is hashable, as is a tuple (as explained above). Lists and dictionaries are NOT hashable as they can change.

Converting a sequence to a dictionary

Dictionaries can be created from a sequence where each item in the sequence has two elements, by using the dict function e.g.

fruit_colours = dict( [ ["banana", "yellow"], ["orange", "orange"], ["strawberry", "red"]] )
fruit_colours = dict( ( ("banana", "yellow"), ("orange", "orange"), ("strawberry", "red")) )

both produce the same dictionary as in the previous section.

Note that while for this example this way of creating a dictionary might seem superfluous, there are scenarios where it is very useful. For example, if we have a function that generates a list of 2-element lists, but we want the result as a dictionary, we can simply convert the list to a dictionary as per above, without having to write a new function!

Manipulating Dictionaries

Once a dictionary has been created, it can be grown or shrunk slightly differently to lists:

  • Adding items : d[NEW_KEY] = NEW_VALUE - creates a new key-value pair, or updates one if it already exists
  • Removing items : d.pop(KEY, DEFAULT) - removes KEY and returns its value, or DEFAULT if KEY doesn’t exist in the dictionary.

As you might have spotted, you can’t assign multiple values with the same key - instead if you write

d[ALREADY_EXISTING_KEY] = NEW_VALUE

the old value that ALREADY_EXISTING_KEY pointed to will be overwritten.

Exercise : Dictionaries - Wherefore art thou Romeo!

Write a script (name the file exercise_dicts.py) to count the occurrences of the the words

  • "sword"
  • "love"
  • "wench"
  • "fool"

in Shakespeare’s Romeo & Juliet.

Use the following initial two lines (feel free to copy and paste!) which will pull the text from an online source and assign it to a variable called text

import urllib.request
text = urllib.request.urlopen("http://www.textfiles.com/etext/AUTHORS/SHAKESPEARE/shakespeare-romeo-48.txt").read().decode('utf8')

Controlling program flow

Up to now, the short scripts we’ve written have been simple linear sequences of commands we want Python to execute.

While some simple tasks won’t go beyond this format, most scripts and programs use conditional execution and looping to achieve their tasks.

Conditional execution: if...else

Many computational tasks require us to conditionally perform some operations, or where the computation forks depending on the value of some variable.

For example, we might have a calculation that at some point generates a distance.

We then want to print the distance in a nice human readable format; if the distance is 0.0001 m we might prefer it to read “0.1 mm”, while if the distance is 32000 m printing “20 miles” would be more useful.

As with most programming languages, this conditional execution is achieved through use of an if-else statement.

For example,

# Somewhere in a chunk of code, we have calculated a distance in meters
# which we store in variable "dist" 
print(dist)                          # This would print 0.0001 or 32000 in the two cases above
if (dist < 0.1):
    print("Which is " + str( dist * 1000 ) + " mm")
elif( dist > 1609 ):
    print("Which is " + str( dist / 1609 ) + " miles")
else:
    print("Which is " + str(dist) + " meters")

Here we’ve shown the three possible parts of an if block; the if part will always be present, while the elif and else are optional.

The syntax is:

if <STATEMENT THAT EVALUATES TO TRUE OR FALSE> : 
    # Stuff to do if it's true

elif <ANOTHER STATEMENT THAT EVALUATES TO TRUE OR FALSE>:     # OPTIONAL 
    # Stuff to do if the `if` part was False but this part is True
 
else: # OPTIONAL 
    # Stuff to do if all previous if/elifs were False. 

Reminder: Indentation!

As mentioned at the end of the first section, the indentation above is not accidental!

Python uses indentation to demarcate logical blocks. In this case, everything after the if statement that is indented is inside the if block.

Returning to the previous level of indentation signals the end of the if block.

The elif and else are the only exceptions in this case, as they are logical extensions to the if block.

To further illustrate this consider the following code;

if a < 5:
    print("Ya")
print("Hoo")

This section of code will always print Hoo (irrespective of the value contained in a) and only print Ya if a is less than 5.

On the other hand

if a < 5:
    print("Ya") 
    print("Hoo")

will only print Ya followed by Hoo (if a is less than 5) or nothing (if a is more than or equal to 5).

Changing just the indentation level of the second print statement, has determined whether it is part of the if block or not.

Other languages

If you’ve learnt C,C++, or Java, then indentation is roughly the same as curly brackets in those languages. For example the last snippet of code in Java would be written as

if( a<5 ){
    System.out.println("Ya");
    System.out.println("Hoo");
}

(where in Java the indentation is optional!)

Looping with for-in

To really start to benefit from having a computer do things for us, we often need it to do almost the same thing repeatedly, with only small changes between each time it repeats.

For example, we might want to batch process a directory full of data files, in which case we change only the filename between each run and the program treats each file in the same way.

We might be iterating over a set of time series and applying the same filters and analyses to each one.

Or on an even finer-scale level, most image processing algorithms work by, at some point, looping over all the pixels in an image and performing the same operation at each location.

Whatever the level of processing, these scenarios all have one thing in common; they’re achieved by using loops. In most cases, we would choose a for loop, which is a way of iterating over a set list of elements (files, line-series, pixels). In most languages, we have to explicitly change the base object of each iteration, usually by indexing into a list.

This can also be done with Python. For example

l = ["a","b","c","d","e"]
for i in range(len(l)):
    print(l[i])

outputs:

a
b
c
d
e

Here,

  • we start with a list of 5 letters
  • Then we start a for loop over the list of index values generated by range, 0-4
  • Inside the loop, we print the value of the list at the index value

i is usually referred to as the loop variable as it’s the thing that changes between each invocation of the code inside the loop.

So in each iteration of the loop, i takes on a value (from 0 to 4) and we use that value to index the original list l, and print the item in the list (letter) at index i.

However, in Python, instead of having to loop over indices, we can loop over the items we’re interested in themselves directly;

l = ["a","b","c","d","e"]
for c in l:
    print(c)

outputs:

a
b
c
d
e

While we haven’t saved any lines of code (yet!), the syntax has become much more readable.

Loosely translated:

for c in l: is the same as for each item in l that we’ll call c.

As shown above, we can still use indecies if we prefer.

Auto-generating the index variable with enumerate

Python even provides a convenience function, called enumerate, that returns both an index and a value for each item in a sequence:

l = ["a","b","c","d","e"]
for i,c in enumerate(l):
    print("At index ", i, "the item is ", c)

outputs:

At index  0 the item is  a
At index  1 the item is  b
At index  2 the item is  c
At index  3 the item is  d
At index  4 the item is  e

Exercise : Conditional Flow & Loops - number sorting

Write a script (name the file exercise_loops.py) that creates a list of the numbers 0 to 49 inclusive (0, 1, .., 49), and then prints out for each one whether it is odd or even.

Another loop construct: while

There is another looping mechanism we can use instead of the for loop, called a while loop. while loops are mainly used when the exact number of times we want to repeat something is not known before starting the loop. For things like pixels and files in a directory, we would usually use for loops as we know before starting the loop how many files there are or how many pixels there are. But if we had an optimization algorithm, such algorithms usually continue until a desired closeness to an ideal solution has been found (or a maximum number of tries has been reached).

As writing optimization algorithms is difficult, we’re going to simulate one using the random number module instead.

For example

import random # This imports the random number module 
n = 1
tries = 0
while n > 0.1:
    n = random.random()     # This generates a random number between 0 and 1
    tries += 1
print("It took", tries, "tries to roll a random number less than 0.1, =", n)

A typical output from this would be

It took 3 tries to roll a random number less than 0.1, = 0.07209438256934753

So what’s happening here?

After importing the random module (more on this in the next section!), we start with two variables, n for the random number, and tries to hold the number of times we’ve tried.

We set n to 1 so that the while loop will be executed at least once (otherwise if e.g. n starts at 0, then the while condition will immediately be False and the loop would never be entered!), and tries to 0.

Then, inside the while loop, we generate a new value for the random number, and increase tries by 1.

As we then come to the end of the body of the while block, the condition gets tested again - is n still more than 0.1 or not? If so, go again. And so on until n is less than (or equal to) 0.1. Once n is less than or equal to 0.1, the comparison n > 0.1 returns False, and the while loop stops.

Then we proceed to the next line and print the text and values.

Can you guess what would happen if we removed the line that generates a new random value for n?

If you’re not sure, work through the snippet line by line and see if you can figure it out.

If you’re still not sure, ask a demonstrator.

Exercise : Conditional while loops

Write a script (name the file exercise_while.py) that determines how many squares (startng with the sequence 12 = 1, 22 = 4, 32 = 9, etc) are needed, such that their sum exceeds 1,000,000.

Have the script print out the number of squares needed and their sum (which should exceed 1,000,000!).

Modules

Modules (aka libraries) are how Python developers share functionality they have created with the rest of the Python community.

Python is bundled with a large range of modules in what is called the Python Standard Library.

These encompass a very wide range of functionalities including

  • os : “Miscellaneous operating system interfaces”
  • os.path : “Common pathname manipulations”
  • sys : “System-specific parameters and functions”
  • math : “Mathematical functions”
  • random : “Generate pseudo-random numbers”
  • csv : “CSV File Reading and Writing”

which are modules that I use regularly. The full list is much, much more extensive and includes things like

  • tkinter : “Python interface to Tcl/Tk” (For creating simple Graphical User Interfaces)
  • wave : “Read and write WAV files” (Audio)
  • ftplib : “FTP protocol client”
  • http.server : “HTTP servers” (i.e. Web server)
  • email : “An email and MIME handling package”
  • multiprocessing : “Process-based parallelism”
  • zipfile : “Work with ZIP archives”

to name an assortement of Standard Library modules I’ve used less frequently.

The total list contains hundreds of modules, and covers a wide spectrum of things you can do with a computer!

In addition to the Standard Library, there are many thousands more (69544(from January this year) 93151 (now!) at last count) modules listed in the Python Package Index, that range from modules created by lone developers to projects involving hundreds to thousands of developers over many years, such as Scipy. You can browse the list at of modules at the Python Package Index or use the search interface to find modules by keywords.

Using modules

To use a Python module we must first import it,

import MODULENAME

Once imported, we access a modules functions using the dot notation, <MODULENAME>.<FUNCTIONNAME>(...).

For example, above we used the random module to generate random numbers, so after the line

import random

we were able to call a module function using, e.g.

random.random()

Python also allows us to provide an alias for a module, which is a different way of refering to it. For example if I wrote

import random as rnd

I would then be able to later use the random.random function by writing

rnd.random()

This can be handy, especially when using modules with long names, or a submodule of a submodule of a submodule!

Lastly, submodules or functions can also be imported individually if desired, by using the from syntax:

from MODULE import FUNCTION

or

from MODULE.SUBMODULE import FUNCTION

to import a function from a module or submodule, which will then be accessible as FUNCTION (i.e. without needing to specify the module name!).

Similarly

from MODULE import SUBMODULE 

imports a submodule without needing to prefix it with the module name.

Creating modules

Creating a module is extremely simple in Python. In fact we’ve created several modules already!

Any file ending in .py is a simple module!

If for example we create a script file called mymod.py and in that script we define a function (which we’ll learn about in the last section of this workshop) called super_function, then we can create another file and import that module using

import mymod

This then gives us access to the function we defined, e.g.

mymod.super_function

Using if __name__ == "__main__": to hide script code

When writing a script to be used as a module, it is useful to have a mechanism for running part of the script only if the script is being run as a script, and not being imported!

This is because when a script is imported, it is essentially run!

To “hide” code from being run when imported Python introduced the following pattern:

def function1():
    ...

def function2():
    ...


if __name__ == "__main__":
    print("I'm being run as a script lets do script stuff!")
    function1(10,20)

We’ll cover the function definitions for the functions function1 and function2 in the next section. Of interest here is the last part of the snippet; by using the special built-in variable __name__ and checking whether it is equal to "__main__", we can check if the file is being run as a script or imported.

This is because the __name__ variable is only equal to "__main__" when being run as a script; if the file is being imported, then __name__ becomes the name of the module.

No exercise here!

While it’s important to understand that it’s actually very easy to create a Python module, doing so isn’t necessary for the context of this course, and so we won’t be practicing this here.

Advanced Users: module folders

More complex modules are created by using a folder or folders that contain a file named in a special way (__init__.py). For example if we have a folder tree:

joesModule / 
    __init__.py
    submodule1.py
    submodule2.py

Then we can use

import joesModule.submodule1 

to import the functionality contained in submodule1.py.

Reading and Writing Data

Python contains a single function for opening files, open.

By passing in a flag for how the file should be opened, we can either read, overwrite, or apend a file.

A file that is opened for reading won’t be modified, so this is always the default mode to prevent accidentally modifying files.

The result of the open function is a file object, which has a number of member functions which we can use to read from or write to the file.

Note: if we open the file using the default read mode, then attempting to call the write or writelines member functions would result in an error.

Aside on relative vs absolute paths

As soon as we want to interact with a file-system, which usually means reading or writing (saving) files, we need to be aware of the difference between relative and absolute paths.

This concept may be a little foreign if you’re used to graphical operating system environments like Windows or MacOS, though you will probably be familiar with the processes of navigating through a file system using e.g. the Windows File Explorer.

For example, when you are viewing the contents of your “My Documents” folder, the full path of that folder is something like

C:\Users\Joe\My Documents

so that when referring to a file in that folder, you would specify the full path to the file as e.g.

C:\Users\Joe\My Documents\File1.txt

However, from within File Explorer, when you are viewing the contents of My Documents, you can double click on File1.txt to open it. In that case, the File Explorers working directory is C:\Users\Joe\My Documents, so that the relative reference to “File1.txt” makes sense as it means “File1.txt” in the current working directory.

Referring to files in Python code is similar!

If you refer to "File1.txt" in Python code, it will look for a file called "File1.txt" in the working directory of the script (i.e. the same folder that the script is in).

If you want to refer to a file that is not in the working directory of the script, you would use its absolute path, e.g. C:\Users\Joe\My Documents\File1.txt

The os module has a submodule called path, (i.e. os.path) which is useful for working with file paths.

Reading files

For the time being, we’re only going to be concerned with the functions readline, and readlines with respect to reading files (though read is also useful!).

These are convenient methods to read either a line at a time (readline) into a string, or the entire file (readlines) into a list of strings.

For example if we have a very simple text file that contains

My 
Name 
is 
Sam

and the file is named sample.txt (which has the full path /path/to/file/sample.txt), then we could use

fd = open("/path/to/file/sample.txt")
print(fd.readline())
fd.close()

and the output would be

My

Similarly, we could use

fd = open("/path/to/file/sample.txt")
print(fd.readlines())
fd.close()

and the output would be

["My", "Name", "is", "Sam"]

If we want to read the file line-by-line (for example, if it is a particularly large file!), we can iterate over the file object itself:

fd = open("/path/to/file/sample.txt")
for line in fd:
    print(line)
fd.close()
    

Why the close?

After opening and accessing the file, we also need to finally close the file, using the close member function.

This is similar to when you work on a word document or any other file; if the file is still open when you try and access it with another program, you will sometimes receive errors, as your computer is warning you that you might still be modifying the file elsewhere and so manipulating it in the meantime is dangerous!

Writing files

The process for writing to a file is very similar to reading from a file; first we get a file object using open, except this time we add the additional mode flag as either "w" or "a", corresponding to write which overwrites the original file, or append which only appends to the original file. Note: if the file doesn’t exist before calling open, these two are the same!

Now we use either write or writelines to write either a single string, or a list of strings to the file.

** Important note **

A major difference however between the read and write operations, is that neither write nor writelines insert newline characters. So slightly deceptively,

fd = open('writetest.txt', 'w')
fd.writelines(['a', 'b', 'c'])
fd.close()

would produce a file called writetest.txt that contains

abc

i.e. not three separate lines!

Instead, to actually write three separate lines, we need to add the newline character, \n to each line:

fd = open('writetest.txt', 'w')
fd.writelines(['a\n', 'b\n', 'c\n'])
fd.close()

A common alternative to “manually” adding a newline character to each line, is to use the string member function join and write a single string:

fd = open('writetest.txt', 'w')
fd.write('\n'.join(['a', 'b', 'c']))
fd.close()

Here, join is a string object member function (in this case the string '\n') that takes a list as an input, and joins all of the items in the list using the string object it was called with.

Exercise : Reading a data file

We have seen above how to read text from a file and display it in the console.

As researchers, we usually need to do more with data and the data is often in numerical format.

Download the data file from here: data_exercise_reading.csv, saving the file to your python exercises scripts directory.

The file contains a table of comma separated values. The values start

Time,Signal
0,100
5,101
10,98
:
9995,102

The first line contains the column headers, and the subsequent lines contain the time and signal values.

Write a new python script (exercise_reading.py) that

  • reads in the headers (but you don’t need to keep these!)
  • and the data values
    • converts the data values into numbers
  • On the signal data, calculate the
    • sum
    • mean
    • and population standard deviation
  • output those statistics to the console

Functions

Now that we’ve learnt about the basics of Python syntax, as well as how to use modules, it’s time to think about starting to modularize our own code!

A function is a way of modularizing code, such that given a set of inputs (or none), the same set of commands are executed each time a function is executed.

Similarity to Mathematical Functions

Functions are related to the idea of mathematical functions e.g. f(x)

Example mathematical function:

f(x) = x + 2

f(5) = ?

Answer:

If f(x) = x + 2

Then f(5) = 5 + 2

Therefore f(5) = 7

Calling functions

We’ve already been using, or calling, functions that were defined by others since we started this workshop. The first function we called was the print function.

All functions are called in the same way:

    <FUNCTION NAME>( VALUE FOR ARGUMENT1, VALUE FOR ARGUMENT2, ...)

where ARGUMENT1, ARGUMENT2 refers to possible input arguments to the function.

In words, functions are called using the name of the function, then open parentheses, then the argument list (comma separated), and then close parentheses. The number of arguments that must be passed into the function depends on how the function was defined.

Defining functions

To define a function in Python, we use the def statement:

def <FUNCTION NAME>( ARGUMENT1, ARGUMENT2, ...):

so very similar to calling a function, except that we start the line with def, and we end the line with the colon, :.

The number of arguments is up to us and is dependent on what inputs the function needs.

The function body follows and is indented relative to the def ... line.

If we want to return values from our function we use the return statement;

def <FUNCTION NAME>(ARGUMENT1, ARGUMENT2,...):
    # Function body
    :
    :
    return value1, value2

# Code no longer in function definition

The function body ends when the indentation level returns to that before the def statement.

So to use our maths function example of f(x) = x + 2 we could write this as a python function like this:

def f(x):
    answer = x + 2 
    return answer

or more compact:

def f(x):
    return x + 2

A function must be defined before it can be called. Every built-in function and standard library module function we’ve been using is defined somewhere.

Exercise : Our first function definition

To get used to defining functions, lets start by defining a trivial function that replaces functionality that we already know.

In a new script file (exercise_function.py)

  • define a function called “add_numbers”
    • that takes two inputs,
    • and returns their sum
  • Print to the console the result of calling the function on 40 and 2.

Exercise : Upgrading our numerical analysis script

We have already written code that could be modularized in previous exercises. Lets upgrade the code from the exercise on “reading a data file” to a function.

In a new script file (exercise_modularization.py)

  • Copy and paste the code from the “reading a data file” exercise in the previous section
  • Modularize that code by
    • Creating a function definition called “analyze_file” that
      • Takes a file path as an input
      • Opens the file and reads in the data
      • Returns the statistics
    • Then call this function with the data file name as an input, and print the result to the terminal
    • Verify that the result is the same as the original script

Why we would do this?

The answer is that now we can call the same functionality on any file, or more importantly, on many files.

For example we could have 1000 files that all contain such data; the benefits of having a single function that is called on each one instead of a script with hard-coded input file name:

  • We don’t need to copy the script file 1000 times and change the input file name in each…
  • If we want to add an additional statistic… we don’t need to then update 1000 script files! We only update the one function! The same goes for if we find a bug in the code.

Positional and keyword arguments

“Traditionally” function arguments are positional, meaning that the value that is passed into a function call at the first position, is assigned to the first variable in the function definition, the second to the second, and so on.

As well as these positional arguments, Python functions often accept keyword arguments. Keyword arguments are provided as <KEYWORD>=<VALUE> pairs.

Keyword arguments are always optional, as they are given default values when the function is defined, while positional arguments are non-optional and do not have default values.

For example even the print function takes several keyword arguments.

If you try the following in a script

print("a", end="\n\n\n\n")
print("b")

You should see an output like

a



b

because the print function used four new line characters instead of the default of one new line character.

Similarly

print("a", end="--NEXT--")
print("b")

would output

a--NEXT--b

Exercise : Adding a keyword argument

In a new script file (exercise_keyword.py)

  • Copy and paste the function definition for “add_numbers”
  • Rename the function to “add_numbers2”
    • Add a keyword argument in the function definition
      • called absolute
      • that defaults to False
    • Add a couple of lines in the code before the sum that
      • if absolute is True, converts all inputs to their absolute values
  • Print to the console the result of calling the function on 40 and 2.
  • Print to the console the result of calling the function on 2 and -2
  • Print to the console the result of calling the function on 2 and -2 if you also pass in the keyword argument absolute set to True

Documenting code

Another feature we need to start adding as our scripts grow, is documentation.

All good source code contains good commenting to explain to other programmers, or remind the author, of what the code is doing.

Comments using the hash (aka pound symbol if you come from the USA) symbol typically appear every few lines in well written code.

Pulling a random section of code from a standard python module:

# From : pathlib.py, line 1000
.
.
.

    def absolute(self):
        """Return an absolute version of this path.  This function works
        even if the path doesn't point to anything.

        No normalization is done, i.e. all '.' and '..' will be kept along.
        Use resolve() to get the canonical path to a file.
        """
        # XXX untested yet!
        if self._closed:
            self._raise_closed()
        if self.is_absolute():
            return self
        # FIXME this must defer to the specific flavour (and, under Windows,
        # use nt._getfullpathname())
        obj = self._from_parts([os.getcwd()] + self._parts, init=False)
        obj._init(template=self)
        return obj

    def resolve(self):
        """
        Make the path absolute, resolving all symlinks on the way and also
        normalizing it (for example turning slashes into backslashes under
        Windows).
        """
        if self._closed:

.
.
.

Ignoring the majority of what’s actually written, this section illustrates a few things regarding commenting:

  • Comments don’t have to appear every line, just every now-and-again to help people reading the code
  • Comments are also used to keep track of when things need attention, e.g. the use of FIXME above

Special comments in the form of multi-line strings (using three single/double quote symbols, """ ... """) are used immediately after function definitions to document functions. These are called docstrings and the Python help system scans source code for these when you call e.g. help <FUNCTION NAME>.

Exercise : Adding documentation

Copy and paste your previous exercise (add_numbers2) into a new script file (exercise_docstring.py) and add a docstring that explains what the function does and how to use it.

Then run python -m pydoc exercise_docstring in the terminal, from your exerice folder.

Final Exercise

Note

Don’t forget to use the good coding suggestions, including frequent commenting / documentation of your code in this exercise.

In light of revelations regarding government agency snooping, you have decided to encrypt your personal communications!

Being a budding Pythonista, and having heard of the Caesar cipher, you will now write a script (exercise_encryption.py) that

Part 1

  • Has an encryption function which takes
    • text(string) and an offset(integer) as a input
    • converts the text to cipher-text using the given offset to generate the cipher
    • outputs the cipher-text
  • Test your function in the if __main__ == ... section of your code (check back to the module section for what this means!)
    • Using any string you like, and offset 0 - you should get back the original text
    • The same text but 1 offset – e.g. the cat should become uifadbu
  • Add a decryption function that
    • takes text and an offset as an input
    • Generates a cipher with the given offset
    • Decrypts the input text using the cipher
    • Returns the decrypted text
    • HINT: think about how the encryption & decryption functions work (i.e. how similar they are…) - maybe you can avoid creating a whole new decryption function ?!?

Note:

With offset 1, the cipher should become: bcde...xyz a, i.e. include space as a character, in contrast to the example on the wiki page.

I.e. the plain-text too cipher translation table for offset 1 should be:

PLAIN  : "abcd...xyz " 
CIPHER : "bcde...yz a"

Ask a demonstrator if you are unclear about this.

Part 2

  • Add a file-reading and writing function that
    • Takes a filename and offset as inputs
    • Opens and reads text in the specified file
    • Calls your encryption function on the text, using the offset provided as an input
    • Writes the output to a file with the same base name and extension as the input text, but is named <INPUT NAME>-encrypted.<INPUT EXTENSION>
    • Returns the encrypted file name
  • Download the plain text file from here: data_exercise_encryption.txt,
  • Call your script with this file as an input to the previously defined encryption function (and non-zero offset!), and verify that you get a new encrypted file that contains encrypted text.
  • Add a file-reading function that
    • Takes a filename and offset as inputs
    • Opens and reads text in the specified file
    • Calls your decryption function on the text, using the offset provided as an input
    • Returns the resulting decrypted text
  • Print the return value of this function to the terminal (passing the previously encrypted file)

Part 3: Bonus section

Now that you can encrypt text, and decrypt text that was encrypted with a known offset cipher, let’s test our script using an encrypted file where we don’t know the offset.

To do this we’re going to use a “brute-force” algorithm,

  • download the encrypted text from here: data_exercise_encryption_secret.txt
  • Write a loop over all possible offsets
    • For each offset, decrpyt the text and print the final message and offset to the terminal
  • Skim through the decyrpted texts; hopefully one and only one of the decryped texts should make sense!