Juergen Salk <juergen.s...@uni-ulm.de> writes:

> that is interesting. We have a very similar setup as well. However, in
> our Slurm test cluster I have noticed that it is not the *job* that
> gets killed. Instead, the OOM killer terminates one (or more)
> *processes*

Yes, that is how the kernel OOM killer works.

This is why we always tell users to use "set -o errexit" in their job
scripts.  Then at least the job script exits as soon as one of its
processes are killed.

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

Attachment: signature.asc
Description: PGP signature

Reply via email to