[slurm-users] How to use Autodetect=nvml in gres.conf

Dean Schulze Wed, 05 Feb 2020 12:09:20 -0800

I need to dynamically configure gpus on my nodes.  The gres.conf doc says
to use


Autodetect=nvml

in gres.conf instead of adding configuration details to each gpu in
gres.conf.  The docs aren't really clear about this because they show an
example with the details for each gpu:

AutoDetect=nvml
Name=gpu Type=gp100  File=/dev/nvidia0 Cores=0,1
Name=gpu Type=gp100  File=/dev/nvidia1 Cores=0,1
Name=gpu Type=p6000  File=/dev/nvidia2 Cores=2,3
Name=gpu Type=p6000  File=/dev/nvidia3 Cores=2,3
Name=mps Count=200  File=/dev/nvidia0
Name=mps Count=200  File=/dev/nvidia1
Name=mps Count=100  File=/dev/nvidia2
Name=mps Count=100  File=/dev/nvidia3
Name=bandwidth Type=lustre Count=4G

First Question:  If I use Autodetect=nvml do I also need to specify File=
and Cores= for each gpu in gres.conf?  I'm hoping that with Autodetect=nvml
that all I need is the Name= and Type= for each gpu.  Otherwise it's not
clear what the purpose of setting Autodetect=nvml would be.

Second Question:  I installed the CUDA tools from the binary
cuda_10.2.89_440.33.01_linux.run.  When I restart slurmd with
Autodetect=nvml in gres.conf I get this error:

fatal: We were configured to autodetect nvml functionality, but we weren't
able to find that lib when Slurm was configured.

Is there something else I need to configure to tell slurmd how to use nvml?

[slurm-users] How to use Autodetect=nvml in gres.conf

Reply via email to