Re: [slurm-users] NVML not found when Slurm was configured.

2022-11-11 Thread Michael Lewis
Yes sorry Rob, I mean I did build and install with --with-nvml and it didn't find it. I then tried again specifying the location. Unfortunately, at that point users needed to run a few jobs and I wasn't able to further investigate. I will get back at it when they've finished. Mike __

Re: [slurm-users] NVML not found when Slurm was configured.

2022-11-11 Thread Groner, Rob
I'm not sure what you mean by "didn't work out for me". The error indicates slurm wasn't correctly configured for nvml when it was built, so the first step would be to get the slurm source and run configure --with-nvml and see what it says. There's a CHANCE the error indicates slurm can't find

Re: [slurm-users] NVML not found when Slurm was configured.

2022-11-11 Thread Michael Lewis
Yep they’re installed and can get all the gpu info from smi. Thanks, Mike From: Dj Merrill Sent: Friday, November 11, 2022 3:41:56 PM To: slurm-users@lists.schedmd.com ; Michael Lewis Subject: Re: [slurm-users] NVML not found when Slurm was configured. At the

Re: [slurm-users] NVML not found when Slurm was configured.

2022-11-11 Thread Michael Lewis
Unfortunately this didn’t work out for me or I’m simply doing it wrong. When the current users hop off the system I’ll do some more troubleshooting. Any other insight or tips to steer me in the right direction are greatly appreciated. Mike From: slurm-users on behalf of Michael Lewis Repl

Re: [slurm-users] Cgroups not constraining memory & cores

2022-11-11 Thread Sean Maxwell
Hi Sean, A couple ideas: 1) In your original cgroups.conf you have "TaskAffinity=no", but I'm not aware of that parameter for cgroups.conf and cannot find it documented. You may want to remove it. 2) Also in cgroups.conf, you may want to try adding "ConstrainSwapSpace=yes" so that the process can

Re: [slurm-users] Cgroups not constraining memory & cores

2022-11-11 Thread Sean McGrath
Hi, Many thanks for that pointer Sean. I had missed the PrologFlags=Contain setting so have added it to slurm.conf now. I've also explicitly built slurm with pam support: ../configure --sysconfdir=/home/support/pkgs/slurm/etc --prefix=/home/support/pkgs/slurm/ubuntu_20.04/21.08.8-2 --localsta

Re: [slurm-users] NVML not found when Slurm was configured.

2022-11-11 Thread Michael Lewis
Thanks Rob! No I just grabbed it through apt. I’ll try that now. Mike From: slurm-users on behalf of "Groner, Rob" Reply-To: Slurm User Community List Date: Friday, November 11, 2022 at 9:32 AM To: "slurm-users@lists.schedmd.com" Subject: Re: [slurm-users] NVML not found when Slurm was con

Re: [slurm-users] NVML not found when Slurm was configured.

2022-11-11 Thread Groner, Rob
Hi Mike, I can't tell if you're compiling slurm or not on your own. You will have to if you want the functionality. On RedHat8, I had to install cuda-nvml-devel-11-7, so find what the equivalent is for that in Ubuntu. Basically, whatever package includes nvml.h and libnvidia-ml.so. Then, mo

[slurm-users] NVML not found when Slurm was configured.

2022-11-11 Thread Michael Lewis
Hello Everyone, New here and very new to slurm and hopefully someone can shed some light on this for me. I’m in the process of setting up a single node slurm environment with nvidia a100. I keep getting the error We were configured to autodetect nvml functionality, but we weren't able to find