Job states

Jobs typically pass through several states in the course of their execution. The typical states are PENDING, RUNNING, SUSPENDED, COMPLETING and COMPLETED. An explanation of each state follows:

  • BF BOOT_FAIL: Job terminated due to launch failure, typically due to a hardware failure (e.g. unable to boot the node or block and the job can not be requeued).

  • CA CANCELLED: Job was explicity cancelled by the user or system administrator. The job may or may not have been initiated.

  • CD COMPLETED: Job has terminated all processes on all nodes with an exit code of zero.

  • CF CONFIGURING: Job has been allocated resources, but is waiting for them to become ready for use (e.g. booting).

  • CG COMPLETING: Job is in the process of completing. Some processes on some nodes may still be active.

  • DL DEADLINE: Job terminated on deadline.

  • F FAILED: Job terminated with non-zero exit code or other failure condition.

  • NF NODE_FAIL: Job terminated due to failure of one or more allocated nodes.

  • OOM OUT_OF_MEMORY: Job experienced out of memory error.

  • PD PENDING: Job is awaiting resource allocation.

  • PR PREEMPTED: Job terminated due to preemption.

  • R RUNNING: Job currently has an allocation and it’s being performed.

  • RD RESV_DEL_HOLD: Job is being held after requested reservation was deleted.

  • RF REQUEUE_FED: Job is being requeued by a federation.

  • RH REQUEUE_HOLD: Held job is being requeued.

  • RQ REQUEUED: Completing job is being requeued.

  • RS RESIZING: Job is about to change size.

  • RV REVOKED: Sibling was removed from cluster due to other cluster starting the job.

  • SI SIGNALING: Job is being signaled.

  • SE SPECIAL_EXIT: The job was requeued in a special state. This state can be set by users, typically in EpilogSlurmctld, if the job has terminated with a particular exit value.

  • SO STAGE_OUT: Job is staging out files.

  • ST STOPPED: Job has an allocation, but execution has been stopped with SIGSTOP signal. CPUS have been retained by this job.

  • S SUSPENDED: Job has an allocation, but execution has been suspended and CPUs have been released for other jobs.

  • TO TIMEOUT: Job terminated upon reaching its time limit.