Juergen Salk <juergen.s...@uni-ulm.de> writes: > that is interesting. We have a very similar setup as well. However, in > our Slurm test cluster I have noticed that it is not the *job* that > gets killed. Instead, the OOM killer terminates one (or more) > *processes*
Yes, that is how the kernel OOM killer works. This is why we always tell users to use "set -o errexit" in their job scripts. Then at least the job script exits as soon as one of its processes are killed. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature