High-Performance Computing

A hands-on-workshop covering High-Performance Computing (HPC).

This workshop assumes you have a basic understanding of the Unix Operating System. If not, then you should take a look at the Hands-on Unix Workshop

Introduction

If you are attending a workshop called "High Performance Computing" then you can skip ahead to Topic 1 as the introduction will be covered in the introductory presentation (slides)

What is an HPC?

An HPC is simply a large collection of server-grade computers working together to solve large problems.

Big: HPCs typically have lots of CPUs and Memory and consequently large jobs.
Shared: There are usually lots of users making use of it at one time
Coordinated: There is a coordinator program to ensure fair-use between its users
Compute Collection: HPCs use a number of computers at once to solve lots of large jobs

HPC Structure

Figure: The user (face at top) interacts with their local PC/Laptop through the keyboard and screen. The PC/Laptop will connect to the Head/Login node of the HPC interactively. The Head/Login node will send the jobs off to the Compute Nodes when one is available.

Why use HPCs?

The main reason we use HPCs is because they are quite big. Given their size, they are usually very expensive, however through sharing the resources the per user/job cost can be kept low.

Many CPUs: HPCs typically have 100's to 10000's of CPUs. Compare this with the 4 or 8 that your PC/Laptop might have.
Large Memory: 100's of GBs to multiple TBs of RAM are typical for each node.
Efficient use: through sharing the resources each user can have access to a very large computer for a period and hand it back for others to use later.

Software Modules

There are typically 100's to 1000's of software packages installed on an HPC. Given that each can have its own special requirements and multiple versions will be made, Software on the HPC will most commonly be packaged and only made available to you when you request it.

Packaged: to avoid conflicts between software, each is packaged up into a module and only used on demand.
Loadable: before using a software module you need to load it.
Versions: given not all users want to use the same version of software (and to compare new results with old you might need the same version) each version is made into its own software module so you have ultimate control.

Job Submission

Job Submission is the process of instructing the HPC to perform a task for you. Depending on the HPC software installed on your HPC, the process of doing so might be different.

SLURM: this workshop uses an HPC that uses the SLURM HPC software. Some common alternatives (not covered) are PBS or SGE/OGE
Queues (Partition): when a job is submitted it is added to a work queue; in SLURM this is called a Partition.
Batch: HPC jobs are not 'interactive'. By this we mean, you can't type input into your job's programs and you won't immediately see the output that your program prints on the screen.

Resources

So that SLURM knows how to schedule and fit jobs around each other, you need to specify what resources your job will use. That is, you need to tell it how many CPUs, RAM, Nodes (servers), and Time you need.

CPUs: most software is limited using 1 CPU by default but many can use more than one (or you can run multiple copies at once). The number of CPUs you specify needs to match how many things your software can do at once.
Memory: you need to estimate (or guess) how much memory (RAM) your program needs.
Nodes: most software will only use one of the HPC's Nodes (i.e. One server) however some software can make use of more than one to solve the problem sooner.
Time: like when you are scheduling meetings, SLURM needs to know how long each job will take (maximum) so it can organise other jobs afterwards.

Job Types

There are two types of jobs that you can submit:

Shared: a shared job (as the name suggests) is one that shares a node with other jobs. This is the default and preferred method.
Exclusive: an exclusive job gets a single (or multiple) nodes to itself. Given this exclusivity, this type of job must know how to use multiple CPUs as most HPCs will have at least 16 CPUs per node.

How to use this workshop

The workshop is broken up into a number of Topics each focusing on a particular aspect of HPCs. You should take a short break between each to refresh and relax before tackling the next.

Topics may start with some background followed by a number of exercises. Each exercise begins with a question, then sometimes a hint (or two) and finishes with the suggested answer.

Question

An example question looks like:

What is the Answer to Life?

Hint

Depending on how much of a challenge you like, you may choose to use hints. Even if you work out the answer without hints, its a good idea to read the hints afterwards because they contain extra information that is good to know.

Note: hints may be staged, that is, there may be a more section within a hint for further hints

Hint <- click here to reveal hint

What is the answer to everything?

As featured in "The Hitchhiker's Guide to the Galaxy"

More <- and here to show more

It is probably a two digit number

Answer

Once you have worked out the answer to the question expand the Answer section to check if you got it correct.

Answer <- click here to reveal answer

Answer: 42

Ref: Number 42 (Wikipedia)

Usage Style

This workshop attempts to cater for two usage styles:

Problem solver: for those who like a challenge and learn best be trying to solve the problems by-them-selves (hints optional):
- Attempt to answer the question by yourself.
- Use hints when you get stuck.
- Once solved, reveal the answer and read through our suggested solution.
- Its a good idea to read the hints and answer description as they often contain extra useful information.
By example: for those who learn by following examples: Expand all sections
- Expand the Answer section at the start of each question and follow along with the commands that are shown and check you get the same (or similar) answers.
- Its a good idea to read the hints and answer description as they often contain extra useful information.

Connecting to HPC

To begin this workshop you will need to connect to an HPC. Today we will use the LIMS-HPC. The computer called
lims-hpc-m.latrobe.edu.au (m is for master which is another name for head node) is the one that coordinates all the HPCs tasks.

Server details:

host: lims-hpc-m.latrobe.edu.au
port: 6022
username: trainingXX
password: (provided at workshop)

Connection instructions:

Mac OS X / Linux

Both Mac OS X and Linux come with a version of ssh (called OpenSSH) that can be used from the command line. To use OpenSSH you must first start a terminal program on your computer. On OS X the standard terminal is called Terminal, and it is installed by default. On Linux there are many popular terminal programs including: xterm, gnome-terminal, konsole (if you aren't sure, then xterm is a good default). When you've started the terminal you should see a command prompt. To log into LIMS-HPC, for example, type this command at the prompt and press return (where the word username is replaced with your LIMS-HPC username):

$ ssh -p 6022 trainingXX@lims-hpc-m.latrobe.edu.au

The same procedure works for any other machine where you have an account except that if your Unix computer uses a port other than 22 you will need to specify the port by adding the option -p PORT with PORT substituted with the port number.

You may be presented with a message along the lines of:

The authenticity of host 'lims-hpc-m.latrobe.edu.au (131.172.24.10)' can't be  established.
...
Are you sure you want to continue connecting (yes/no)?

Although you should never ignore a warning, this particular one is nothing to be concerned about; type yes and then press enter. If all goes well you will be asked to enter your password. Assuming you type the correct username and password the system should then display a welcome message, and then present you with a Unix prompt. If you get this far then you are ready to start entering Unix commands and thus begin using the remote computer.

Windows

On Microsoft Windows (Vista, 7, 8, 10) we recommend that you use the PuTTY ssh client. PuTTY (putty.exe) can be downloaded from this web page:

http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

Documentation for using PuTTY is here:

http://www.chiark.greenend.org.uk/~sgtatham/putty/docs.html

When you start PuTTY you should see a window which looks something like this:

Putty Connection Dialog

To connect to LIMS-HPC you should enter lims-hpc-m.latrobe.edu.au into the box entitled "Host Name (or IP address)" and 6022 in the port, then click on the Open button. All of the settings should remain the same as they were when PuTTY started (which should be the same as they are in the picture above).

In some circumstances you will be presented with a window entitled PuTTY Security Alert. It will say something along the lines of "The server's host key is not cached in the registry". This is nothing to worry about, and you should agree to continue (by clicking on Yes). You usually see this message the first time you try to connect to a particular remote computer.

If all goes well, a terminal window will open, showing a prompt with the text "login as:". An example terminal window is shown below. You should type your LIMS-HPC username and press enter. After entering your username you will be prompted for your password. Assuming you type the correct username and password the system should then display a welcome message, and then present you with a Unix prompt. If you get this far then you are ready to start entering Unix commands and thus begin using the remote computer.

Putty login screen

Topic 1: Exploring an HPC

An HPC (short for 'High Performance Computer') is simply a collection of Server Grade computers that work together to solve large problems.

HPC Structure

Figure: Overview of the computers involved when using an HPC. Computer systems are shown in rectangles and arrows represent interactions.

Exercises

1.1) What is the contact email for your HPC's System Administrator?

Hint

When you login, you will be presented with a message; this is called the Message Of The Day and usually includes lots of useful information. On LIMS-HPC this includes a list of useful commands, the last login details for your account and the contact email of the system administrator

High-Performance Computing

Introduction

What is an HPC?

Why use HPCs?

Software Modules

Job Submission

Resources

Job Types

How to use this workshop

Question

Hint

Answer

Usage Style

Connecting to HPC

Topic 1: Exploring an HPC

Exercises

Topic 2: Software Modules

Exercises

Topic 3: Job Submission

Background

Exercises

Advanced

Topic 4: Job Monitoring

Exercises

Top

Sacct (Slurm Accounting)

LIMS-HPC Specific

Topic 5: All Together

Task 1: Write a job script

Task 2: Load/use software module

Task 3: Submit job

Task 4: Monitor the job

Finished