Allan, thanks for the explanation.

So the name "gpu" is not that generic. It actually means something to SLURM, or 
its plugin.

This leads me another related question: how do I define a generic resource that 
isn't associated with any real device?


Regards,
George



> -----Original Message-----
> From: Allan Streib [mailto:[email protected]]
> Sent: Wednesday, November 01, 2017 5:43 AM
> To: Hwa, George <[email protected]>; slurm-dev <slurm-
> [email protected]>
> Subject: [EXTERNAL]: Re: [slurm-dev] what does File=/dev/nvidai0 actually do?
> 
> "Hwa, George" <[email protected]> writes:
> 
> > In example gres.conf,
> >
> >    Name=gpu File=/dev/nvidia0
> >
> > Does slurm actually read the device file and get information from it
> > for configuration/control?
> 
> It seems to me that it at least will do an existence check. If the GPU device 
> files
> are not there (e.g. drivers not loaded) then the node will appear to be down.
> 
> I've had some cases after reboots where I've needed to run 'nvidia-smi'
> on the node to get the /dev/nvidia? device files created.
> 
> I'm running a fairly old release so possibly newer versions do more?
> 
> Allan

Reply via email to