Hello,
We have a use case in which we need to launch multiple concurrently running MPI
applications inside a job allocation. Most supercomputing facilities limit the
number of concurrent job steps as they incur an overhead with the global Slurm
scheduler. Some frameworks, such as the Flux framew
Hi All,
we are trying to implement preemption in one of our partitions so we can run
priority jobs on it and suspend the ones running on the partition and resume
once the priority job is done. We have read through the Slurm documentation and
did the configuration, but somehow we can not make it
This is a quick update on the status. Upgrading to Slurm 23.11.4 fixed the
issue. It appears we were bitten by the following bug:
-- Fix stuck processes and incorrect environment when using --get-user-env
This was triggered for us because we had set SBATCH_EXPORT=NONE for our
users.
Kind regards,
CPUs are released, but memory is not released on suspend. Try looking at this
output and compare allocated Memory before and after suspending a job on a node:
sinfo -N -n yourNode
--Format=weight:8,nodelist:15,cpusstate:12,memory:8,allocmem:8
From: Verma, Nischey (HPC ENG,RAL,LSCI) via slurm-u
I think you need set reasonable "DefMemPerCPU" - otherwise jobs will
take all memory by default, and there is no remaining memory for the
second job.
We calculated DefMemPerCPU in such way, that the default allocated
memory of full node is slightly under half of total node memory. So
there i