Basic commands

sbatch

Send a script to a SLURM partition. The only mandatory parameters are the estimated time and the estimated memory per node/CPU. For example, to send the script script.sh with a duration of 24 hours:

$ sbatch -t 24:00:00 --mem=4GB script.sh

If the command is executed successfully, it returns the number of the job (<jobid>). See more detailed information below.

srun

Commonly used to run a parallel task on a script controlled by SLURM.

sinfo

Displays information about SLURM nodes and partitions. Provides information about: * Existing partitions (PARTITION) * Whether or not they are available (AVAIL) * The maximum time of each partition (TIMELIMIT. If it is infinite then it is regulated externally) * The nodes belonging to each partition (NODES) * Node state, the most common are:

  • idle: means available

  • alloc: means in use

  • mix: means part of your CPUs are available

  • resv: means reserved for an specific use

  • drain: means temporarily removed for technical reasons

Information about a specific partition:

$ sinfo -p <partitionname>

Information every 60 seconds:

$ sinfo -i60

List reasons nodes are in the down, drained, fail or failing state:

$ sinfo -R

squeue

Displays information about (your) jobs and their status in the Slurm scheduling queue.

State of a job with the jobid:

$ squeue -j <jobid>

Report the expected start time and resources to be allocated for pending jobs in order of increasing start time:

$ squeue --start

List all the running jobs:

$ squeue -t RUNNING

List all the pending jobs:

$ squeue -t PENDING

List the jobs demanding a specific partition:

$ squeue -p <partition name>

scancel

It is used to signal or cancel jobs, job arrays or job steps

Cancel a job:

$ scancel <jobid>

Cancel all pending jobs:

$ scancel -t PENDING

Cancel one or more jobs with name <jobname>:

$ scancel --name <jobname>

Cancel all jobs:

$ scancel -u $USER

scontrol

Returns detailed information about the nodes, partitions, job steps, and configuration. It is used for monitoring and modifing queued jobs.

Show detailed information about a job:

$ scontrol show jobid -dd <jobid>

Write the batch script for a given job_id to a file or to stdout:

$ scontrol write batch_script <jobid> -

Prevent a pending job from being started (without cancel it):

$ scontrol hold <jobid>

Release a previously held job to begin execution:

$ scontrol release <jobid>

Requeue a running, suspended or finished Slurm batch job into pending state (equivalent to scancel + sbatch):

$ scontrol requeue <jobid>

sacct

Displays accounting data for all jobs and job steps. This command is used for jobs monitorization.

Job accounting query, displays accounting data for all jobs and job steps in the Slurm database:

$ sacct

Show the accounting information of a detailed job:

$ sacct -j <jobid>

With -l option show all the fields:

$ sacct -l

To show only specific fields:

$ sacct --format=JobID,JobName,State,NTasks,NodeList,Elapsed,ReqMem,MaxVMSize,
MaxVMSizeNode,MaxRSS,MaxRSSNode

sqstat

Detailed information about the queue system, resources compsuntion, status of all partitions and jobs:

$ sqstat

ssh into nodes where a job is running is available to view job status or other tasks.

For mor detailed information of SLURM commands: http://slurm.schedmd.com/pdfs/summary.pdf