I recently set up slurm 2.6.5 on a cluster of Ubuntu 14.04.1 systems hosting 
several
NVIDIA GPUs set up as generic resources. When the compute nodes are rebooted, I
noticed that they attempt to start slurmd before the device files initialized by
the nvidia kernel module appear, i.e., the following  message appears in syslog
some number of lines before the GPU kernel driver load messages.

slurmd[1453]: fatal: can't stat gres.conf file /dev/nvidia0: No such file or 
directory

Is there a recommended way (on Ubuntu, at least) to ensure that slurmd isn't
started before any GPU device files appear?
-- 
Lev Givon
Bionet Group | Neurokernel Project
http://www.columbia.edu/~lev/
http://lebedov.github.io/
http://neurokernel.github.io/

Reply via email to