Tasks on specific cores
Warning
To use binding properties you must use the nodes exclusively with the tag #SBATCH --exclusive
in your SBATCH script
In current multicore architectures, if you want to obtain optimal performance it’s always recommended to use options for binding processes to physical CPUs (cores), not allowing the system to migrate processes between the different cores and ensuring the closest proximity between the data in memory and the core that reads/writes them.
By default in the FinisTerrae III configuration each of the requested tasks are associated with a cgroup (Linux Control Groups) that will not allow the task’s processes to run outside of the physical resources associated with that task.
Most of the nodes have 64 cores and 256GB of RAM memory.
- Affinity: Assignment of a process to a certain logical processor.
- Affinity Mask: Bitmask where the indices correspond to logical processors. ‘1’ implies the possibility of executing a process in the logical processor associated with the corresponding position in the mark.
1. Task level affinity (Pure MPI)
It is always recommended to specify the number of tasks per node with --ntasks-per-node
or the queueing system will allocate a block model trying to fill the node before using the next one, always ensuring at least one task per node.
With the option --cpu_bind=[{quiet,verbose},]type
it’s possible to specify the affinity of each of the tasks/processes to the logical processor.
type: rank
: Automatic affinity according to the rank MPI of the process. The lowest ranked task on each node is assigned to logical processor 0.
map_cpu:<list>
: Affinity per node to the list of logical processors specified. The example above should perform better by specifying this affinity:
It is recommended to consult the srun
man page regarding this option in case it could facilitate the execution of a specific case.
You can find more information about binding at SLURM official website.
2. Thread level affinity (OpenMP and MPI/OpenMP jobs)
The affinity of the threads derived from parallelized processes in shared memory (OpenMP in general) is specified through the use of affinity masks (number in hexadecimal that, when converted into binary, the position of the “1” enables the use of the logical processor).
As in the case of the execution of MPI or serial jobs to the queuing system, the number of tasks to be executed must be requested as well as the number of logical processors assigned to that task using the option -c, --cpus-per-task=<ncpus>
. Also, with the option --cpu_bind=[{quiet,verbose},]type
It is possible to specify the affinity of each of the tasks/processes to the logical processor, but in this case, mask options must be specified in type.
type*: *mask_cpu:<list>
: With this option, the mask of each of the tasks to be executed by node is specified. It is advisable to use a mask in hexadecimal made up of 6 digits.
It is exclusively recommended to use the binding options available in Slurm. It is not recommended to mix the binding options of the different packages: use binding options with slurm and in turn use those provided by the MPI or Shared Memory Parallelization (OpenMP) libraries.
You can find more information about binding at SLURM official website.