Chapter 7.2 : Slurm

Here is a brief overview of the commands you will use to submit jobs to the mesocentre. You can find more information in the mesocentre documentation.

Submitting jobs

The mesocentre uses the SLURM workload manager to manage the jobs submitted to the cluster. The SLURM commands are used to submit, monitor and control the jobs.

To submit a job to the mesocentre, you have to create a script with the instructions to run your code. For example, if you want to run a script like the following one.

Example for CPU job

1
2
3
4
5
6
7
8
9
10
11
12
13
#!/bin/bash
#SBATCH --job-name=job_name
#SBATCH --output=job_name-%J.out
#SBATCH --error=job_name-%J.err
#SBATCH --time=1:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=skylake
#SBATCH --mem=10GB
#SBATCH --reservation=grayscottcpu

# List the available cpu
lscpu


Example for GPU job

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/bin/bash
#SBATCH --job-name=job_name
#SBATCH --output=job_name-%J.out
#SBATCH --error=job_name-%J.err
#SBATCH --time=1:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --partition=volta
#SBATCH --mem=10GB
#SBATCH --reservation=grayscottgpu

# List the available gpu cards
nvidia-smi


Then you can submit the job to the cluster with the following command:

1
2
> chmod +x script.sh
> sbatch script.sh


Once the job is submitted, you can check the status of the job with the following command:

1
> squeue -u <username>


The output of the job will be written to the file job_name-%J.out and the error to the file job_name-%J.err.

Interactive jobs

You can also run an interactive job on the mesocentre. To do so, you have to use the following command:

1
> srun -p skylake -n 1 -c 1 -t 3:00:00 --mem=10GB --reservation=grayscottcpu --pty bash


This command will allocate one cpu for 3 hours with 10GB of memory. Once the job is running, you can run your code in the terminal.

To exit the interactive job, you can use the command exit or press Ctrl+D. This will kill the job and free the allocated resources.