Hi Jürgen,

you're looking for KillOnBadExit in the slurm.conf:
KillOnBadExit
    If set to 1, a step will be terminated immediately if any task is crashed 
or aborted, as indicated by a non-zero exit code. With the default value of 0, 
if one of the processes is crashed or aborted the other processes will continue 
to run while the crashed or aborted process waits. The user can override this 
configuration parameter by using srun's -K, --kill-on-bad-exit.

this should terminate the job if a step or a process gets oom-killed.

Best,
Marcus

On 19-10-08 10:36, Juergen Salk wrote:
> * Bjørn-Helge Mevik <b.h.me...@usit.uio.no> [191008 08:34]:
> > Jean-mathieu CHANTREIN <jean-mathieu.chantr...@univ-angers.fr> writes:
> > 
> > > I tried using, in slurm.conf 
> > > TaskPlugin=task/affinity, task/cgroup 
> > > SelectTypeParameters=CR_CPU_Memory 
> > > MemLimitEnforce=yes 
> > >
> > > and in cgroup.conf: 
> > > CgroupAutomount=yes 
> > > ConstrainCores=yes 
> > > ConstrainRAMSpace=yes 
> > > ConstrainSwapSpace=yes 
> > > MaxSwapPercent=10 
> > > TaskAffinity=no 
> > 
> > We have a very similar setup, the biggest difference being that we have
> > MemLimitEnforce=no, and leave the killing to the kernel's cgroup.  For
> > us, jobs are killed as they should. [...] 
> 
> Hello Bjørn-Helge,
> 
> that is interesting. We have a very similar setup as well. However, in
> our Slurm test cluster I have noticed that it is not the *job* that
> gets killed. Instead, the OOM killer terminates one (or more)
> *processes* but keeps the job itself running in a potentially 
> unhealthy state.
> 
> Is there a way to tell Slurm to terminate the whole job as soon as 
> the first OOM kill event takes place during execution? 
> 
> Best regards
> Jürgen
> 
> -- 
> Jürgen Salk
> Scientific Software & Compute Services (SSCS)
> Kommunikations- und Informationszentrum (kiz)
> Universität Ulm
> Telefon: +49 (0)731 50-22478
> Telefax: +49 (0)731 50-22471
> 

-- 
Marcus Vincent Boden, M.Sc.
Arbeitsgruppe eScience
Tel.:   +49 (0)551 201-2191
E-Mail: mbo...@gwdg.de
---------------------------------------
Gesellschaft fuer wissenschaftliche
Datenverarbeitung mbH Goettingen (GWDG)
Am Fassberg 11, 37077 Goettingen
URL:    http://www.gwdg.de
E-Mail: g...@gwdg.de
Tel.:   +49 (0)551 201-1510
Fax:    +49 (0)551 201-2150
Geschaeftsfuehrer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender:
Prof. Dr. Christian Griesinger
Sitz der Gesellschaft: Goettingen
Registergericht: Goettingen
Handelsregister-Nr. B 598
---------------------------------------

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to