sbatch: Send a script to a SLURM partition. The only mandatory parameter is the estimated time. To send the script script.sh with a duration of 24 hours:
$ sbatch -t 24:00:00 script.sh
In case the command is executed successfully, it returns the number of the job (<jobid>). See more detailed information below.
srun: Commonly used to run a parallel task on a script controlled by SLURM.
sinfo: Displays information about SLURM nodes and partitions. Provides information about existing partitions (PARTITION), whether or not they are available (AVAIL), the maximum time of each partition (TIMELIMIT. If it is infinite, it is regulated externally), the nodes belonging to each partition (NODES), node state (the most common are: idle means available, alloc means in use, mix means part of your CPUs are available, resv means reserved for an specific use, and drain means temporarily removed for technical reasons).
Information about a specific partition:
$ sinfo -p <partitionname>
Information every 60 seconds:
$ sinfo -i60
List reasons nodes are in the down, drained, fail or failing state:
$ sinfo -R
squeue: Displays information about jobs and their status in the Slurm scheduling queue.
State of a job with jobid <jobid>:
$ squeue -j <jobid>
Report the expected start time and resources to be allocated for pending jobs in order of increasing start time:
$ squeue --start
List all the running jobs:
$ squeue -t RUNNING
List all the pending jobs:
$ squeue -t PENDING
List the jobs demanding a specific partition:
$ squeue -p <partition name>
scancel: It is used to signal or cancel jobs, job arrays or job steps
Cancel a job:
$ scancel <jobid>
Cancel all pending jobs:
$ scancel -t PENDING
Cancel one or more jobs with name <jobname>:
$ scancel --name <jobname>
scontrol: Returns detailed information about the nodes, partitions, job steps, and configuration. It is used for monitoring and modifing queued jobs.
Show detailed informatio about a job:
$ scontrol show jobid -dd <jobid>
Prevent a pending job from being started (without cancel it):
$ scontrol hold <jobid>
Release a previously held job to begin execution:
$ scontrol release <jobid>
Requeue a running, suspended or finished Slurm batch job into pending state (equivalent to scancel + sbatch):
$ scontrol requeue <jobid>
sacct: displays accounting data for all jobs and job steps.
Show the accounting information of a detailed job:
$ sacct -j <jobid>
With -l option show all the fields:
$ sacct -l
Yo show only specific fields:
$ sacct --format=JobID,JobName,State,NTasks,NodeList,Elapsed,ReqMem,MaxVMSize,MaxVMSizeNode,MaxRSS,MaxRSSNode