[slurm-users] First setup of slurm with a GPU node

Patrick Begou via slurm-users Wed, 13 Nov 2024 03:02:07 -0800

Hi,

I'm using slurm on a small 8 nodes cluster. I've recently added one GPUnode with two Nvidia A100, one with 40Gb of RAM and one with 80Gb.

As using this GPU resource increase I would like to manage this resourcewith Gres to avoid usage conflict. But at this time my setup do notworks as I can reach a GPU without reserving it:


   srun -n 1 -p tenibre-gpu ./a.out

can use a GPU even if the reservation do not specify this resource(checked with running nvidia-smi on the node). "tenibre-gpu" is a slurmpartition with only this gpu node.

From the documentation I've created a gres.conf file and it ispropagated on all the nodes (9 compute nodes, 1 login node and themanagement node) and slurmd has been restarted.


gres.conf is:*

   ## GPU setup on tenibre-gpu-0
   NodeName=tenibre-gpu-0 Name=gpu Type=A100-40 File=/dev/nvidia0
   Flags=nvidia_gpu_env
   NodeName=tenibre-gpu-0 Name=gpu Type=A100-80 File=/dev/nvidia1
   Flags=nvidia_gpu_env
   *
   *

In slurm.conf I have checked these flags:

   ## Basic scheduling
   SelectTypeParameters=CR_Core_Memory
   SchedulerType=sched/backfill
   SelectType=select/cons_tres

   ## Generic resources
   GresTypes=gpu

   ## Nodes list
   ....
   Nodename=tenibre-gpu-0 RealMemory=257270 Sockets=2 CoresPerSocket=16
   ThreadsPerCore=1 State=UNKNOWN
   ....

   #partitions
   PartitionName=tenibre-gpu MaxTime=48:00:00 DefaultTime=12:00:00
   DefMemPerCPU=4096 MaxMemPerCPU=8192 Shared=YES  State=UP
   Nodes=tenibre-gpu-0
   ...



May be I've missed something ?  I'm running Slurm 20.11.7-1.

Thanks for your advices.

Patrick

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] First setup of slurm with a GPU node

Reply via email to