[slurm-users] Overhead of multiple concurrent job steps

2024-03-15 Thread Mehta, Kshitij via slurm-users
Hello, We have a use case in which we need to launch multiple concurrently running MPI applications inside a job allocation. Most supercomputing facilities limit the number of concurrent job steps as they incur an overhead with the global Slurm scheduler. Some frameworks, such as the Flux framew

[slurm-users] Slurm suspend preemption not working

2024-03-15 Thread Verma, Nischey (HPC ENG,RAL,LSCI) via slurm-users
Hi All, we are trying to implement preemption in one of our partitions so we can run priority jobs on it and suspend the ones running on the partition and resume once the priority job is done. We have read through the Slurm documentation and did the configuration, but somehow we can not make it

[slurm-users] Re: Issues with Slurm 23.11.1

2024-03-15 Thread Fokke Dijkstra via slurm-users
This is a quick update on the status. Upgrading to Slurm 23.11.4 fixed the issue. It appears we were bitten by the following bug: -- Fix stuck processes and incorrect environment when using --get-user-env This was triggered for us because we had set SBATCH_EXPORT=NONE for our users. Kind regards,

[slurm-users] Re: Slurm suspend preemption not working

2024-03-15 Thread Williams, Jenny Avis via slurm-users
CPUs are released, but memory is not released on suspend. Try looking at this output and compare allocated Memory before and after suspending a job on a node: sinfo -N -n yourNode --Format=weight:8,nodelist:15,cpusstate:12,memory:8,allocmem:8 From: Verma, Nischey (HPC ENG,RAL,LSCI) via slurm-u

[slurm-users] Re: Slurm suspend preemption not working

2024-03-15 Thread Josef Dvoracek via slurm-users
I think you need set reasonable "DefMemPerCPU" - otherwise jobs will take all memory by default, and there is no remaining memory for the second job. We calculated DefMemPerCPU in such way, that the default allocated memory of full node is slightly under half of total node memory. So there i