That usually means you don't have the nvidia kernel module loaded,
probably because there's no driver installed.
Relu
On 2020-10-08 14:57, Sajesh Singh wrote:
Slurm 18.08
CentOS 7.7.1908
I have 2 M500 GPUs in a compute node which is defined in the
slurm.conf and gres.conf of the cluster, but if I launch a job
requesting GPUs the environment variable CUDA_VISIBLE_DEVICES Is never
set and I see the following messages in the slurmd.log file:
debug: common_gres_set_env: unable to set env vars, no device files
configured
Has anyone encountered this before?
Thank you,
SS