Basic commands
sbatch
Send a script to a SLURM partition. The only mandatory parameters are the estimated time and the estimated memory per node/CPU. For example, to send the script script.sh with a duration of 24 hours:
$ sbatch -t 24:00:00 --mem=4GB script.sh
If the command is executed successfully, it returns the number of the job (<jobid>). See more detailed information below.
srun
Commonly used to run a parallel task on a script controlled by SLURM.
sinfo
Displays information about SLURM nodes and partitions. Provides information about: * Existing partitions (PARTITION) * Whether or not they are available (AVAIL) * The maximum time of each partition (TIMELIMIT. If it is infinite then it is regulated externally) * The nodes belonging to each partition (NODES) * Node state, the most common are:
idle: means available
alloc: means in use
mix: means part of your CPUs are available
resv: means reserved for an specific use
drain: means temporarily removed for technical reasons
Information about a specific partition:
$ sinfo -p <partitionname>
Information every 60 seconds:
$ sinfo -i60
List reasons nodes are in the down, drained, fail or failing state:
$ sinfo -R
squeue
Displays information about (your) jobs and their status in the Slurm scheduling queue.
State of a job with the jobid:
$ squeue -j <jobid>
Report the expected start time and resources to be allocated for pending jobs in order of increasing start time:
$ squeue --start
List all the running jobs:
$ squeue -t RUNNING
List all the pending jobs:
$ squeue -t PENDING
List the jobs demanding a specific partition:
$ squeue -p <partition name>
scancel
It is used to signal or cancel jobs, job arrays or job steps
Cancel a job:
$ scancel <jobid>
Cancel all pending jobs:
$ scancel -t PENDING
Cancel one or more jobs with name <jobname>:
$ scancel --name <jobname>
Cancel all jobs:
$ scancel -u $USER
scontrol
Returns detailed information about the nodes, partitions, job steps, and configuration. It is used for monitoring and modifing queued jobs.
Show detailed information about a job:
$ scontrol show jobid -dd <jobid>
Write the batch script for a given job_id to a file or to stdout:
$ scontrol write batch_script <jobid> -
Prevent a pending job from being started (without cancel it):
$ scontrol hold <jobid>
Release a previously held job to begin execution:
$ scontrol release <jobid>
Requeue a running, suspended or finished Slurm batch job into pending state (equivalent to scancel + sbatch):
$ scontrol requeue <jobid>
sacct
Displays accounting data for all jobs and job steps. This command is used for jobs monitorization.
Job accounting query, displays accounting data for all jobs and job steps in the Slurm database:
$ sacct
Show the accounting information of a detailed job:
$ sacct -j <jobid>
With -l option show all the fields:
$ sacct -l
To show only specific fields:
$ sacct --format=JobID,JobName,State,NTasks,NodeList,Elapsed,ReqMem,MaxVMSize,
MaxVMSizeNode,MaxRSS,MaxRSSNode
sqstat
Detailed information about the queue system, resources compsuntion, status of all partitions and jobs:
$ sqstat
ssh into nodes where a job is running is available to view job status or other tasks.
For mor detailed information of SLURM commands: http://slurm.schedmd.com/pdfs/summary.pdf