Hi,

I'm using slurm on a small 8 nodes cluster. I've recently added one GPU node with two Nvidia A100, one with 40Gb of RAM and one with 80Gb.

As using this GPU resource increase I would like to manage this resource with Gres to avoid usage conflict. But at this time my setup do not works as I can reach a GPU without reserving it:

   srun -n 1 -p tenibre-gpu ./a.out

can use a GPU even if the reservation do not specify this resource (checked with running nvidia-smi  on the node). "tenibre-gpu" is a slurm partition with only this gpu node.

From the documentation I've created a gres.conf file and it is propagated on all the nodes (9 compute nodes, 1 login node and the management node) and slurmd has been restarted.

gres.conf is:*

   ## GPU setup on tenibre-gpu-0
   NodeName=tenibre-gpu-0 Name=gpu Type=A100-40 File=/dev/nvidia0
   Flags=nvidia_gpu_env
   NodeName=tenibre-gpu-0 Name=gpu Type=A100-80 File=/dev/nvidia1
   Flags=nvidia_gpu_env
   *
   *

In slurm.conf I have checked these flags:

   ## Basic scheduling
   SelectTypeParameters=CR_Core_Memory
   SchedulerType=sched/backfill
   SelectType=select/cons_tres

   ## Generic resources
   GresTypes=gpu

   ## Nodes list
   ....
   Nodename=tenibre-gpu-0 RealMemory=257270 Sockets=2 CoresPerSocket=16
   ThreadsPerCore=1 State=UNKNOWN
   ....

   #partitions
   PartitionName=tenibre-gpu MaxTime=48:00:00 DefaultTime=12:00:00
   DefMemPerCPU=4096 MaxMemPerCPU=8192 Shared=YES  State=UP
   Nodes=tenibre-gpu-0
   ...



May be I've missed something ?  I'm running Slurm 20.11.7-1.

Thanks for your advices.

Patrick
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to