On 11/12/24 20:25, egonle--- via slurm-users wrote:
is there any way to listen to job state changes of slurm 23.x or newer?
I’d like to kind of subscribe to job state changes instead of polling for
job states.
Adding this feature to slurm accounting DB seems to be last option right
now, althoug
I suppose you could tail the slurmd log and put those events into a RabbitMQ
instance or something like that
Tim
--
Tim Cutts
Scientific Computing Platform Lead
AstraZeneca
Find out more about R&D IT Data, Analytics & AI and how we can support you by
visiting our Service
Catalogue
Hello Patrick,
Yeah I'd recommend upgrading, and I imagine most others will, too. I have
found with Slurm that upgrades are nearly mandatory, at least annually or
so, mostly because it's more challenging to upgrade from much older
versions and requires bootstrapping. Not sure about the minus sign;
Hi Benjamin,
Yes, I saw this on an archived discussion too and I've added these
parameters. A little bit tricky to do as my setup is deployed via
Ansible. But with this setup I'm not able to request a GPU at all. All
these test are failing and slurm do not accept the job:
srun -n 1 -p tenibr
Yes, I noticed this changed behavior too since v22 (testing v24 now)
The gres definitions in gres.conf are ignored but must be in slurm.conf
My gres.conf file now only has
NodeName=n[79-90] AutoDetect=nvml
-Henk
From: Benjamin Smith via slurm-users
Sent: Wednes
Hi Patrick,
You're missing a Gres= on your node in your slurm.conf:
Nodename=tenibre-gpu-0 RealMemory=257270 Sockets=2 CoresPerSocket=16
ThreadsPerCore=1 State=UNKNOWN Gres=gpu:A100-40:1,gpu:A100-80:1
Ben
On 13/11/2024 16:00, Patrick Begou via slurm-users wrote:
This email was sent to you by
Le 13/11/2024 à 15:45, Roberto Polverelli Monti via slurm-users a écrit :
Hello Patrick,
On 11/13/24 12:01 PM, Patrick Begou via slurm-users wrote:
As using this GPU resource increase I would like to manage this
resource with Gres to avoid usage conflict. But at this time my setup
do not works
Hi Josh,
thanks for reply, that's very helpful. I used exact same compilation
setup as you did, I could have mentioned that. But this gives extra
confidence. So I will just accept current situation and test it as soon
as I have gpus available.
Best,
Matthias
Am 13.11.24 um 13:58 schrieb Jos
Hello Patrick,
On 11/13/24 12:01 PM, Patrick Begou via slurm-users wrote:
As using this GPU resource increase I would like to manage this resource
with Gres to avoid usage conflict. But at this time my setup do not
works as I can reach a GPU without reserving it:
srun -n 1 -p tenibre-gpu
Hi Matthias,
Just another user here, but we did notice similar behaviour on our cluster
with NVIDIA GPU nodes. For this cluster, we built slurm 24.05.1 deb
packages from source ourselves on Ubuntu 22.04 with the `libnvidia-ml-dev`
package installed directly from the Ubuntu package archive (using
t
Hi,
I'm using slurm on a small 8 nodes cluster. I've recently added one GPU
node with two Nvidia A100, one with 40Gb of RAM and one with 80Gb.
As using this GPU resource increase I would like to manage this resource
with Gres to avoid usage conflict. But at this time my setup do not
works as
Hi,
I'm trying to compile Slurm with NVIDIA NVML support, but the result is
unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so, but when
I do "ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so" there is no
reference to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (which I would
expect).
Hi Henk,
On 11/12/24 15:36, Henk Meij wrote:
Ole, I had not made that connection yet ... The *required* part. Could be
documented a bit more clearly, if true.
I've opened a case with SchedMD to make the documentation of
AccountingStorageType clearer - may be in Slurm 24.11.
Small institutio
13 matches
Mail list logo