[slurm-users] One node, two partitions (gpu and cpu), can SLURM map cpu cores well?

2021-05-07 Thread Cristóbal Navarro
Hi community, I am unable to tell if SLURM is handling the following situation efficiently in terms of CPU affinities at each partition. Here we have a very small cluster with just one GPU node with 8x GPUs, that offers two partitions --> "gpu" and "cpu". Part of the Config File ## Nodes list ## u

Re: [slurm-users] Configless mode enabling issue

2021-05-07 Thread Will Dennis
The GPU nodes shouldn’t have any config files — they come in from the controller with configless (i.e. all config files are centralized.) Now, did you build Slurm on the gpu nodes, or install via package mgr? If pkg mgr, do you know if it was compiled/packaged on a node with the NVIDIA libs? (I

Re: [slurm-users] Configless mode enabling issue

2021-05-07 Thread David Henkemeyer
Thank you for the reply, Will! The slurm.conf file only has one line in it: AutoDetect=nvml During my debug, I copied this file from the GPU node to the controller. But, that's when I noticed that the node w/o a GPU then crashed on startup. David On Fri, May 7, 2021 at 12:14 PM Will Dennis wr

Re: [slurm-users] Configless mode enabling issue

2021-05-07 Thread Will Dennis
Hi David, What is the gres.conf on the controller’s /etc/slurm ? Is it autodetect via nvml? In configless the slurm.conf, gres.conf, etc is just maintained on the controller, and the worker nodes get it from there automatically (you don’t want those files on the worker nodes.) If you need to s

[slurm-users] Configless mode enabling issue

2021-05-07 Thread David Henkemeyer
Hello all. My team is enabling slurm (version 20.11.5) in our environment, and we got a controller up and running, along with 2 nodes. Everything was working fine. However, when we try to enable configless mode, I ran into a problem. The node that has a GPU is coming up in "drained" state, and s