Marcus Boden <mbo...@gwdg.de> writes: > you're looking for KillOnBadExit in the slurm.conf: > KillOnBadExit
[...] > this should terminate the job if a step or a process gets oom-killed. That is a good tip! But as I read the documentation (I haven't tested it), it will only kill the job step itself, it will not kill the whole job. Also, it will only have effect for things started with srun, mpirun or similar. However, in combination with "set -o errexit", I believe most OOM kills would get the job itself terminated. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature