For this workshop we will provide an overview of the internal OpenStack cloud, part of the £3M ISCA HPC system and see how, as researchers, we can use this flexible resource to get work done quickly and relatively inexpensively.
You can learn more about ISCA at http://www.exeter.ac.uk/research/hpc/about.
You should be able to access ISCA and all of the functionality below via your own account to do your research. The aim of this workshop is to give you an introduction to OpenStack and Unix so you are ready to take full advantage of its capabilities!
Firstly some concepts and terminology might be confusing so here is some explanation of the advantages of using OpenStack.
An Image: This is the starting point or template for the course. Think of it as a master copy of the computer which contains all the programs and data that are required to follow the course. We will use it as a template to start your own Instance.
An Instance: Almost the first thing you will do is create your own copy of the image - we call this an instance or a virtual machine. It contains everything that was in the image plus any files you create during the course.
A Volume This can be thought of as an additional hard disk which you can add or remove from an instance.
Remote Desktop We can’t plug a monitor and keyboard into a virtual machine (instance) so we need a program on our computer which connects to the instance allows us to see and control the instance from our desktop. In the course we will use a program called X2Go.
Command Line Client It is not really necessary to connect to an instance using remote desktop software. It is possible to connect only via a command line interfaces - you will not get any menus, icons images - you will only be able to type commands to tell the instance what you want to do.
Why do we use OpenStack rather than setting everyone up with a physical server?
Physically 24 computers in the data centre are available for us (and other reaseachers) to use. We could limit the course to 24 and allocate trainees one computer each. While they are scratching their heads trying to understand a new concept their computer is doing nothing. OpenStack takes advantage of this and allows us to have much more flexibility - it allows us to pool the 24 physical machines and then subdivide the pool into what are called virtual machines or instances. We can start more virtual machines than there are physical machines, sharing resources like disk space, memory and processors in whatever way we decide. We can even allocate more total processing power to the virtual machines than is actually available. Providing they are not all fully utilised at the same time it appears to everyone that they have a powerful machine all to themselves.
Are there other advantages?
With the OpenStack/Virtual Machine approach, you can let other people do the hard work for you. There are hundreds of machine images available, configured with just a basic operating system, or suites of programs to model weather, analyse genomes or model complex systems. You can find just one list here https://docs.openstack.org/image-guide/obtain-images.html Or you can create and share your own server images! If you’ve created code or an analysis which a collaborator needs, you can supply them with the OpenStack image (and any associated volumes) and they can run them on their desktop, OpenStack or on a public cloud such as Amazon EC2.
One of the most useful feature is the ability to add storage ‘on-the-fly’ by attaching a Volume to a running instance. Imagine the pain you have to go through if you want to add storage to a physical server. Also, you can increase the processing power and/or RAM of an instance at the click of a button! Is that genome assembly running out of memory? No problem! Right click on the instance, up the amount of RAM and restart it.
In addition, if you are a proficient coder and are creating parallelised code, you may want to test it on a small virtual cluster before moving it to a full cluster. Clusters of any size can be created at the click of a few buttons with OpenStack and can be configured to run Hadoop, MapReduce and many other types of distributed computing.
For this tutorial I borrowed documentation from the following sites: