[slurm-users] Re: Jobs pending with reason "priority" but nodes are idle

2024-09-25 Thread Renfro, Michael via slurm-users
Since nobody replied after this, if the nodes are incapable of running the jobs due to insufficient resources, it may be that the default “EnforcePartLimits=No” [1] might be an issue. That might allow a job to stay queued even if it’s impossible to run. [1] https://slurm.schedmd.com/slurm.conf.

[slurm-users] A note on updating Slurm from 23.02 to 24.05 & multi-cluster

2024-09-25 Thread Ward Poelmans via slurm-users
Hi all, We hit a snag when updating our clusters from Slurm 23.02 to 24.05. After updating the slurmdbd, our multi cluster setup was broken until everything was updated to 24.05. We had not anticipated this. SchedMD says that fixing it would be a very complex operation. Hence, this warning to

[slurm-users] Re: Max TRES per user and node

2024-09-25 Thread Carsten Beyer via slurm-users
Hi Guillaume, as Rob it already mentioned, this could maybe a way for you (partition just created temporarily online for testing). You could also add your MaxTRES=node=1 for more restrictions. We do something similar with QOS to restrict the number of CPU's for user in certain partitions.

[slurm-users] Re: Max TRES per user and node

2024-09-25 Thread Groner, Rob via slurm-users
The trick, I think (and Guillaume can certainly correct me) is that the aim is to allow the user to run as many (up to) 200G mem jobs as they wantso long as they do not consume more than 200G on any single node. So, they could run 10 200G jobson 10 different nodes. So the mem limit isn

[slurm-users] Re: Max TRES per user and node

2024-09-25 Thread Paul Raines via slurm-users
I am pretty sure there is no way to do exactly a per user per node limit in SLURM. I cannot think of a good reason why one would do this. Can you explain? I don't see why it matters if you have two user submitting two 200G jobs if the jobs for the users are spread out over two nodes rather t

[slurm-users] Re: Max TRES per user and node

2024-09-25 Thread Guillaume COCHARD via slurm-users
Hello, Thank you all for your answers. Carsten, as sid by Rob we need a limit per node, not only per user. Paul, we know we are asking for something quite unorthodox. The thing is, we overbook the memory on our cluster (i.e., if a worker has 200G of memory, Slurm can allocate up to 280G on it