Re: [slurm-users] GPU Allocation does not limit number of available GPUs in job

2022-10-27 Thread Sean Maxwell
No problem! Glad it is working for you now. Best, -Sean On Thu, Oct 27, 2022 at 1:46 PM Dominik Baack < dominik.ba...@cs.uni-dortmund.de> wrote: > Thank you very much! > > Those were the missing settings! > > I am not sure how I overlooked it for nearly two days, but I am happy that > its worki

Re: [slurm-users] GPU Allocation does not limit number of available GPUs in job

2022-10-27 Thread Dominik Baack
Thank you very much! Those were the missing settings! I am not sure how I overlooked it for nearly two days, but I am happy that its working now. Cheers Dominik Baack Am 27.10.2022 um 19:23 schrieb Sean Maxwell: It looks like you are missing some of the slurm.conf entries related to enforc

Re: [slurm-users] GPU Allocation does not limit number of available GPUs in job

2022-10-27 Thread Sean Maxwell
It looks like you are missing some of the slurm.conf entries related to enforcing the cgroup restrictions. I would go through the list here and verify/adjust your configuration: https://slurm.schedmd.com/cgroup.conf.html#OPT_/etc/slurm/slurm.conf Best, -Sean On Thu, Oct 27, 2022 at 1:04 PM Do

Re: [slurm-users] GPU Allocation does not limit number of available GPUs in job

2022-10-27 Thread Dominik Baack
Hi, yes ContrainDevices is set: ### # Slurm cgroup support configuration file ### CgroupAutomount=yes # #CgroupMountpoint="/sys/fs/cgroup" ConstrainCores=yes ConstrainDevices=yes ConstrainRAMSpace=yes # # I attached the slurm configuration file as well Cheers Dominik Am 27.10.2022 um 17:57 sc

Re: [slurm-users] GPU Allocation does not limit number of available GPUs in job

2022-10-27 Thread Sean Maxwell
Hi Dominik, Do you have ConstrainDevices=yes set in your cgroup.conf? Best, -Sean On Thu, Oct 27, 2022 at 11:49 AM Dominik Baack < dominik.ba...@cs.uni-dortmund.de> wrote: > Hi, > > We are in the process of setting up SLURM on some DGX A100 nodes . We > are experiencing the problem that all GP

[slurm-users] GPU Allocation does not limit number of available GPUs in job

2022-10-27 Thread Dominik Baack
Hi, We are in the process of setting up SLURM on some DGX A100 nodes . We are experiencing the problem that all GPUs are available for users, even for jobs where only one should be assigned. It seems the requirement is forwarded correctly to the node, at least CUDA_VISIBLE_DEVICES is set to