AI nodes (GPU nodes)
To request the use of a GPU in a job the option
--gres=gpu must be specified. This command has some options wich can be useful.
Specifies a comma delimited list of generic consumable resources. The format of each entry on the list is “name[[:type]:count]”. The name is that of the consumable resource. The count is the number of those resources with a default value of 1. The specified resources will be allocated to the job on each node. The available generic consumable resources is configurable by the system administrator. A list of available generic consumable resources will be printed and the command will exit if the option argument is “help”. Examples of use include:
If set, the only CPUs available to the job will be those bound to the
selected GRES (i.e. the CPUs identified in the gres.conf file will be
strictly enforced rather than advisory). This option may result in
delayed initiation of a job. For example a job requiring two GPUs and
one CPU will be delayed until both GPUs on a single socket are available
rather than using GPUs bound to separate sockets, however the
application performance may be improved due to improved communication
speed. Requires the node to be configured with more than one socket and
resource filtering will be performed on a per-socket basis.
The following GPU models are available on FinisTerrae III:
The average NVIDIA A100 nodes have 2 GPUs per node, you can request the use of 1 or 2 GPUs with the option
--gres=gpu:N where N is 1-2. There are also two new special nodes with more GPUs per node:
* 5x NVIDIA A100: to use this node, set –gres=gpu:N where N is a value between 3-5.
* 8x NVIDIA A100: to use this node, set –gres=gpu:N where N is a value between 6-8.
$ srun --gres=gpu:a100 -c 32 --mem=64G -t 20 nvidia-smi topo -m $ srun --gres=gpu:a100:2 -c 64 --mem=128G -t 20 nvidia-smi topo -m
cpus requested for the 2x NVIDIA A100 nodes must be 32 per GPU requested.
cpus requested for the 5x NVIDIA A100 node must be 12 per GPU requested.
cpus resquested for the 8X NVIDIA A100 node must be 8 per GPU requested.
You can find some script examples here.
$ compute --gpu $ srun -p viz --gres=gpu:t4 --mem=8G -t 20 nvidia-smi topo -m