"Hwa, George" <[email protected]> writes: > In example gres.conf, > > Name=gpu File=/dev/nvidia0 > > Does slurm actually read the device file and get information from it > for configuration/control?
It seems to me that it at least will do an existence check. If the GPU device files are not there (e.g. drivers not loaded) then the node will appear to be down. I've had some cases after reboots where I've needed to run 'nvidia-smi' on the node to get the /dev/nvidia? device files created. I'm running a fairly old release so possibly newer versions do more? Allan
