[slurm-users] Re: Listen to job state changes

2024-11-13 Thread Ole Holm Nielsen via slurm-users
On 11/12/24 20:25, egonle--- via slurm-users wrote: is there any way to listen to job state changes of slurm 23.x or newer? I’d like to kind of subscribe to job state changes instead of polling for job states. Adding this feature to slurm accounting DB seems to be last option right now, althoug

[slurm-users] Re: Listen to job state changes

2024-11-13 Thread Cutts, Tim via slurm-users
I suppose you could tail the slurmd log and put those events into a RabbitMQ instance or something like that Tim -- Tim Cutts Scientific Computing Platform Lead AstraZeneca Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Catalogue

[slurm-users] Re: First setup of slurm with a GPU node

2024-11-13 Thread Jason Simms via slurm-users
Hello Patrick, Yeah I'd recommend upgrading, and I imagine most others will, too. I have found with Slurm that upgrades are nearly mandatory, at least annually or so, mostly because it's more challenging to upgrade from much older versions and requires bootstrapping. Not sure about the minus sign;

[slurm-users] Re: First setup of slurm with a GPU node

2024-11-13 Thread Patrick Begou via slurm-users
Hi Benjamin, Yes, I saw this on an archived discussion too and I've added these parameters. A little bit tricky to do as my setup is deployed via Ansible. But with this setup I'm not able to request a GPU at all. All these test are failing and slurm do not accept the job: srun -n 1 -p tenibr

[slurm-users] Re: [External] Re: First setup of slurm with a GPU node

2024-11-13 Thread Henk Meij via slurm-users
Yes, I noticed this changed behavior too since v22 (testing v24 now) The gres definitions in gres.conf are ignored but must be in slurm.conf My gres.conf file now only has NodeName=n[79-90] AutoDetect=nvml -Henk From: Benjamin Smith via slurm-users Sent: Wednes

[slurm-users] Re: First setup of slurm with a GPU node

2024-11-13 Thread Benjamin Smith via slurm-users
Hi Patrick, You're missing a Gres= on your node in your slurm.conf: Nodename=tenibre-gpu-0 RealMemory=257270 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN Gres=gpu:A100-40:1,gpu:A100-80:1 Ben On 13/11/2024 16:00, Patrick Begou via slurm-users wrote: This email was sent to you by

[slurm-users] Re: First setup of slurm with a GPU node

2024-11-13 Thread Patrick Begou via slurm-users
Le 13/11/2024 à 15:45, Roberto Polverelli Monti via slurm-users a écrit : Hello Patrick, On 11/13/24 12:01 PM, Patrick Begou via slurm-users wrote: As using this GPU resource increase I would like to manage this resource with Gres to avoid usage conflict. But at this time my setup do not works

[slurm-users] Re: [EXTERN] Re: Slurm and NVIDIA NVML

2024-11-13 Thread Matthias Leopold via slurm-users
Hi Josh, thanks for reply, that's very helpful. I used exact same compilation setup as you did, I could have mentioned that. But this gives extra confidence. So I will just accept current situation and test it as soon as I have gpus available. Best, Matthias Am 13.11.24 um 13:58 schrieb Jos

[slurm-users] Re: First setup of slurm with a GPU node

2024-11-13 Thread Roberto Polverelli Monti via slurm-users
Hello Patrick, On 11/13/24 12:01 PM, Patrick Begou via slurm-users wrote: As using this GPU resource increase I would like to manage this resource with Gres to avoid usage conflict. But at this time my setup do not works as I can reach a GPU without reserving it: srun -n 1 -p tenibre-gpu

[slurm-users] Re: Slurm and NVIDIA NVML

2024-11-13 Thread Joshua Randall via slurm-users
Hi Matthias, Just another user here, but we did notice similar behaviour on our cluster with NVIDIA GPU nodes. For this cluster, we built slurm 24.05.1 deb packages from source ourselves on Ubuntu 22.04 with the `libnvidia-ml-dev` package installed directly from the Ubuntu package archive (using t

[slurm-users] First setup of slurm with a GPU node

2024-11-13 Thread Patrick Begou via slurm-users
Hi, I'm using slurm on a small 8 nodes cluster. I've recently added one GPU node with two Nvidia A100, one with 40Gb of RAM and one with 80Gb. As using this GPU resource increase I would like to manage this resource with Gres to avoid usage conflict. But at this time my setup do not works as

[slurm-users] Slurm and NVIDIA NVML

2024-11-13 Thread Matthias Leopold via slurm-users
Hi, I'm trying to compile Slurm with NVIDIA NVML support, but the result is unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so, but when I do "ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so" there is no reference to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (which I would expect).

[slurm-users] Re: [External] Re: InvalidAccount

2024-11-13 Thread Ole Holm Nielsen via slurm-users
Hi Henk, On 11/12/24 15:36, Henk Meij wrote: Ole, I had not made that connection yet ... The *required* part. Could be documented a bit more clearly, if true. I've opened a case with SchedMD to make the documentation of AccountingStorageType clearer - may be in Slurm 24.11. Small institutio