Allan, thanks for the explanation. So the name "gpu" is not that generic. It actually means something to SLURM, or its plugin.
This leads me another related question: how do I define a generic resource that isn't associated with any real device? Regards, George > -----Original Message----- > From: Allan Streib [mailto:[email protected]] > Sent: Wednesday, November 01, 2017 5:43 AM > To: Hwa, George <[email protected]>; slurm-dev <slurm- > [email protected]> > Subject: [EXTERNAL]: Re: [slurm-dev] what does File=/dev/nvidai0 actually do? > > "Hwa, George" <[email protected]> writes: > > > In example gres.conf, > > > > Name=gpu File=/dev/nvidia0 > > > > Does slurm actually read the device file and get information from it > > for configuration/control? > > It seems to me that it at least will do an existence check. If the GPU device > files > are not there (e.g. drivers not loaded) then the node will appear to be down. > > I've had some cases after reboots where I've needed to run 'nvidia-smi' > on the node to get the /dev/nvidia? device files created. > > I'm running a fairly old release so possibly newer versions do more? > > Allan
