Yes sorry Rob, I mean I did build and install with --with-nvml and it didn't
find it. I then tried again specifying the location. Unfortunately, at that
point users needed to run a few jobs and I wasn't able to further investigate.
I will get back at it when they've finished.
Mike
__
I'm not sure what you mean by "didn't work out for me". The error indicates
slurm wasn't correctly configured for nvml when it was built, so the first step
would be to get the slurm source and run configure --with-nvml and see what it
says.
There's a CHANCE the error indicates slurm can't find
Yep they’re installed and can get all the gpu info from smi.
Thanks,
Mike
From: Dj Merrill
Sent: Friday, November 11, 2022 3:41:56 PM
To: slurm-users@lists.schedmd.com ; Michael
Lewis
Subject: Re: [slurm-users] NVML not found when Slurm was configured.
At the
Unfortunately this didn’t work out for me or I’m simply doing it wrong. When
the current users hop off the system I’ll do some more troubleshooting. Any
other insight or tips to steer me in the right direction are greatly
appreciated.
Mike
From: slurm-users on behalf of Michael
Lewis
Repl
Hi Sean,
A couple ideas:
1) In your original cgroups.conf you have "TaskAffinity=no", but I'm not
aware of that parameter for cgroups.conf and cannot find it documented. You
may want to remove it.
2) Also in cgroups.conf, you may want to try adding
"ConstrainSwapSpace=yes" so that the process can
Hi,
Many thanks for that pointer Sean. I had missed the PrologFlags=Contain setting
so have added it to slurm.conf now.
I've also explicitly built slurm with pam support:
../configure --sysconfdir=/home/support/pkgs/slurm/etc
--prefix=/home/support/pkgs/slurm/ubuntu_20.04/21.08.8-2
--localsta
Thanks Rob! No I just grabbed it through apt. I’ll try that now.
Mike
From: slurm-users on behalf of "Groner,
Rob"
Reply-To: Slurm User Community List
Date: Friday, November 11, 2022 at 9:32 AM
To: "slurm-users@lists.schedmd.com"
Subject: Re: [slurm-users] NVML not found when Slurm was con
Hi Mike,
I can't tell if you're compiling slurm or not on your own. You will have to if
you want the functionality.
On RedHat8, I had to install cuda-nvml-devel-11-7, so find what the equivalent
is for that in Ubuntu. Basically, whatever package includes nvml.h and
libnvidia-ml.so. Then, mo
Hello Everyone,
New here and very new to slurm and hopefully someone can shed some light on
this for me. I’m in the process of setting up a single node slurm environment
with nvidia a100. I keep getting the error We were configured to autodetect
nvml functionality, but we weren't able to find