Bases of Slurm
MesoBFC cluster uses Slurm (Simple Linux Utility for Resource Management) for cluster/resource management and job scheduling.
Slurm is responsible for allocating resources to users, providing a framework for starting, executing and monitoring work on allocated resources and scheduling work for future execution.
Slurm job consists in one or more steps, each consisting in one or more tasks each using one or more CPU.
- Jobs are typically created with the
sbatch
command - Steps are created with the
srun
command - Tasks are requested on the job level with
--ntasks
(-n
) or--ntasks-per-node
- CPUs are requested per task with
--cpus-per-task
or-c
.
Slurm main commands
Command | Description |
---|---|
sbatch job.slurm | Submit batch job |
srun <command> | Run step or program inside job |
scontrol show job <jobid> | Show information of a running jobid |
sacct -j <jobid> <options> | Show information of a completed jobid |
seff <jobid> | A tool to display the efficiency (CPU, MEM) of completed jobs |
scancel <jobid> | Cancel/kill jobs |
squeue | Show running/pending jobs |
sinfo | Show cluster informations |
For more information about Slurm: https://slurm.schedmd.com/quickstart.html
Slurm configuration
In Slurm, multiple nodes can be groupes into partitions which are sets of nodes aggregated by shared characteristics or objectives, with associated limits for wall-clock time, job size, etc.
Partition | Description | #Nodes (cores) | #GPU (type) | TimeLimit (default) | Limits |
---|---|---|---|---|---|
mpi1 | MPI based applications (avx512) | 36 (864) | - | TODO! | TODO! |
mpi2 | MPI based applications (icelake) | 48 (2304) | - | TODO! | TODO! |
gpu | Deep and machine learning applications | 17 (544) | 2x17 (A100) | TODO! | TODO! |
MPI partitions
These partition are used for parallel MPI applications.
- Number of MPI tasks (
-n
option) must be divisible by 24 formpi1
, by 48 formpi2
. For exemple 24, 48… formpi1
, 48, 96… formpi2
. - All node memory is allocated, since whole nodes are requested.
- Do not use
mpirun
command to execute application, usesrun
command instead.
For MPI applications, do not use --node
or -N
Slurm option.
Use --ntasks
or -n
instead to request tasks from nodes.
Here is an example of MPI job to be adapted to your needs:
Caution : be sure to add "-l" flag to bash (setup environment)
#!/bin/bash -l
# Fichier submission.SBATCH
#SBATCH --job-name="MPI_JOB"
#SBATCH --output=%x.%J.out ## %x=job name, %J=job id
#SBATCH --error=%x.%J.out
# walltime (hh:mm::ss) max is 8 days
#SBATCH --time=24:00:00
#SBATCH --partition=mpi1
#SBATCH --ntasks=48 ## request 48 MPI tasks
#SBATCH --mem=0 ## request all nodes memory
# votre adresse mail pour les notifs
#SBATCH --mail-user=votreadresseufc@univ-fcomte.fr ### PLEASE EDIT ME
#SBATCH --mail-type=END,FAIL # notify when job end/fail
module purge
module load openmpi-4.1.5/gcc-13.1.0
srun ./myMPI_Application listParams
Submit your job to Slurm:
$ sbatch mpi.slurm
GPU partition
This partition provides 17 AMD (Zen3) nodes, each have the following settings:
- 252 GB of memory
- 32 cores
- 2x Nvidia A100
Slurm partition settings:
- Limit TODO! GPU per user
- Limit TODO! GB memory per job
- TODO! GB default memory allowed per CPU (use
--mem=72G
for example to request more or--mem-per-cpu=32G
to request more per CPU) - Partition TimeLimit is TODO! days (
TODO!-00:00:00
) - Default job WallTime is TODO! hours (
--time TODO!:00:00
) if is not given (use--time
option to override this value)
Interactive mode
Objective: request GPU resource and open an interactive shell.
Example:
$ srun --partition=gpu --time=4:00:00 --gres=gpu:1 --pty bash
where
--partion
use gpu partition--time
request 4h (format HH:MM:SS)--gres=gpu:1
request 1 GPU on the node--pty bash
open bash session
Batch mode
Here is an example of Slurm script to use:
#!/bin/bash -l
# Fichier submission.SBATCH
#SBATCH --job-name="GPU_JOB"
#SBATCH --output=%x.%J.out ## %x=nom_du_job, %J=id du job
#SBATCH --error=%x.%J.out
# walltime (hh:mm::ss) max is 8 days
#SBATCH --time=24:00:00
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
## To request more memory, use --mem option.
## Please don't use more than 128g.
#SBATCH --mem-per-cpu=32G
## votre adresse mail pour les notifs
#SBATCH --mail-user=votreadresseufc@univ-fcomte.fr
#SBATCH --mail-type=END,FAIL
## view allocated GPU cards
nvidia-smi
module purge
module load miniconda3-22.11.1/gcc-13.1.0
conda activate your_env && python GPU_program.py
Submit job to Slurm:
$ sbatch gpu.slurm
Check job status:
$ squeue