Re: [slurm-users] Kill job when child process gets OOM-killed

Arthur Gilly Thu, 10 Jun 2021 19:14:10 -0700

Thanks Michael, set -e errexit is the same as setting #!/bin/bash -e as 
interpreter as far as I’m aware. As I mention in the original post, I would 
like to avoid that. It involves modifying scripts (although to a lesser 
extent), and it would end script execution for other runtime errors or non-0 
exit codes, which may not be desirable. But mainly, it can have unintended 
consequences on script execution (http://mywiki.wooledge.org/BashFAQ/105), and 
altogether does not really do what it claims to, potentially causing other 
hard-to-debug runtime errors. I have officially discouraged our analysts from 
using it for these reasons, so I would prefer to use this as a very last resort 
solution.

Sbatch doesn’t seem to have a -K argument, only srun does, which means I’d have 
to sbatch scripts that launch sbatch commands, which also leads to a 
significant rewrite… I am starting to think that the feature I am after does 
not exist! Since several other people have inquired about this in the past, I 
think it’d be useful to request this as a feature. Is there a place similar to 
Github issues, where users can make these suggestions to SchedMD?

Cheers,

A

-------------------------------------------------------------

Dr. Arthur Gilly

Head of Analytics

Institute of Translational Genomics

Helmholtz-Centre Munich (HMGU)

-------------------------------------------------------------

From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Renfro, 
Michael
Sent: Wednesday, 9 June 2021 19:32
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Kill job when child process gets OOM-killed

Yep, those are reasons not to create the array of 100k jobs.

>From https://www.mail-archive.com/slurm-users@lists.schedmd.com/msg04092.html 
>, deeper in the thread from one of your references, there's a mention of using 
>both 'set -o errexit' inside the job script alongside setting an sbatch 
>parameter of '-K' or '--kill-on-bad-exit' to have a job exit if any of its 
>processes exit with a non-zero error code.

Assuming all your processes exit with code 0 when things are running normally, 
that could be an option.

From: slurm-users <slurm-users-boun...@lists.schedmd.com 
<mailto:slurm-users-boun...@lists.schedmd.com> > on behalf of Arthur Gilly 
<arthur.gi...@helmholtz-muenchen.de <mailto:arthur.gi...@helmholtz-muenchen.de> 
>
Date: Tuesday, June 8, 2021 at 10:00 PM
To: 'Slurm User Community List' <slurm-users@lists.schedmd.com 
<mailto:slurm-users@lists.schedmd.com> >
Subject: Re: [slurm-users] Kill job when child process gets OOM-killed

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.

  _____  

I could say that the limit on max array sizes is lower on our cluster, and we 
start to see I/O problems very fast as parallelism scales (which we can limit 
with % as you mention). But the actual reason is simpler, as I mentioned we 
have an entire collection of scripts which were written for a previous LSF 
system where the “kill job on OOM” setting was active. What you are suggesting 
would lead to us rewriting all these scripts so that each submitted job is 
granular (executes only 1 atomic task) and orchestrate all of it using SLURM 
dependencies etc. This is a huge undertaking and I’d rather just find this 
setting, which I’m sure exists.

-------------------------------------------------------------

Dr. Arthur Gilly

Head of Analytics

Institute of Translational Genomics

Helmholtz-Centre Munich (HMGU)

-------------------------------------------------------------

From: slurm-users <slurm-users-boun...@lists.schedmd.com 
<mailto:slurm-users-boun...@lists.schedmd.com> > On Behalf Of Renfro, Michael
Sent: Tuesday, 8 June 2021 20:12
To: Slurm User Community List <slurm-users@lists.schedmd.com 
<mailto:slurm-users@lists.schedmd.com> >
Subject: Re: [slurm-users] Kill job when child process gets OOM-killed

Any reason *not* to create an array of 100k jobs and let the scheduler just 
handle things? Current versions of Slurm support arrays of up to 4M jobs, and 
you can limit the number of jobs running simultaneously with the '%' specifier 
in your array= sbatch parameter.

From: slurm-users <slurm-users-boun...@lists.schedmd.com 
<mailto:slurm-users-boun...@lists.schedmd.com> > on behalf of Arthur Gilly 
<arthur.gi...@helmholtz-muenchen.de <mailto:arthur.gi...@helmholtz-muenchen.de> 
>
Date: Tuesday, June 8, 2021 at 4:12 AM
To: 'Slurm User Community List' <slurm-users@lists.schedmd.com 
<mailto:slurm-users@lists.schedmd.com> >
Subject: Re: [slurm-users] Kill job when child process gets OOM-killed

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.

  _____  

Thank you Loris!

Like many of our jobs, this is an embarrassingly parallel analysis, where we 
have to strike a compromise between what would be a completely granular array 
of >100,000 small jobs or some kind of serialisation through loops. So the 
individual jobs where I noticed this behaviour are actually already part of an 
array :)

Cheers,

Arthur

-------------------------------------------------------------

Dr. Arthur Gilly

Head of Analytics

Institute of Translational Genomics

Helmholtz-Centre Munich (HMGU)

-------------------------------------------------------------

From: slurm-users <slurm-users-boun...@lists.schedmd.com 
<mailto:slurm-users-boun...@lists.schedmd.com> > On Behalf Of Loris Bennett
Sent: Tuesday, 8 June 2021 16:05
To: Slurm User Community List <slurm-users@lists.schedmd.com 
<mailto:slurm-users@lists.schedmd.com> >
Subject: Re: [slurm-users] Kill job when child process gets OOM-killed

Dear Arthur,

Arthur Gilly <arthur.gi...@helmholtz-muenchen.de 
<mailto:arthur.gi...@helmholtz-muenchen.de> > writes:

> Dear Slurm users,
>
> 
>
> I am looking for a SLURM setting that will kill a job immediately when any 
> subprocess of that job hits an OOM limit. Several posts have touched upon 
> that, e.g: 
> https://www.mail-archive.com/slurm-users@lists.schedmd.com/msg04091.html 
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-archive.com%2Fslurm-users%40lists.schedmd.com%2Fmsg04091.html&data=04%7C01%7Crenfro%40tntech.edu%7Cc3bc0c4af2fe4eacf38808d92af2ad25%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637588044490972285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=AmfXDtCVNH5EbxNAnkkiLimt6g5ZXP4eSJFhV0J9Qo4%3D&reserved=0>
>   and
> https://www.mail-archive.com/slurm-users@lists.schedmd.com/msg04190.html 
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-archive.com%2Fslurm-users%40lists.schedmd.com%2Fmsg04190.html&data=04%7C01%7Crenfro%40tntech.edu%7Cc3bc0c4af2fe4eacf38808d92af2ad25%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637588044490982280%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=GvBruGVoTdrwFB4dlJSPbI%2Fd2ExywZuMtFq%2ByyF2ias%3D&reserved=0>
>   or https://bugs.schedmd.com/show_bug.cgi?id=3216 
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D3216&data=04%7C01%7Crenfro%40tntech.edu%7Cc3bc0c4af2fe4eacf38808d92af2ad25%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637588044490992273%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Nfas0jKpRq1mMfTGYEGuvChPZhcm1ASAnghek0%2BxEG4%3D&reserved=0>
>   but I cannot find an answer that works in our setting.
>
> 
>
> The two options I have found are:
>
> 1 Set shebang to #!/bin/bash -e, which we don’t want to do as we’d need to 
> change this for hundreds of scripts from another cluster where we had a 
> different scheduler, AND it would kill tasks for other runtime errors (e.g. 
> if one command in the
> script doesn’t find a file).
>
> 2 Set KillOnBadExit=1. I am puzzled by this one. This is supposed to be 
> overridden by srun’s -K option. Using the example below, srun -K --mem=1G 
> ./multalloc.sh would be expected to kill the job at the first OOM. But it 
> doesn’t, and happily
> keeps reporting 3 oom-kill events. So, will this work?
>
> 
>
> The reason we want this is that we have script that execute programs in 
> loops. These programs are slow and memory intensive. When the first one 
> crashes for OOM, the next iterations also crash. In the current setup, we are 
> wasting days
> executing loops where every iteration crashes after an hour or so due to OOM.

Not an answer to your question, but if your runs are independent, would
using a job array help you here?

Cheers,

Loris

> We are using cgroups (and we want to keep them) with the following config:
>
> CgroupAutomount=yes
>
> ConstrainCores=yes
>
> ConstrainDevices=yes
>
> ConstrainKmemSpace=no
>
> ConstrainRAMSpace=yes
>
> ConstrainSwapSpace=yes
>
> MaxSwapPercent=10
>
> TaskAffinity=no
>
> 
>
> Relevant bits from slurm.conf:
>
> SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE
>
> SelectType=select/cons_tres
>
> GresTypes=gpu,mps,bandwidth
>
> 
>
> 
>
> Very simple example:
>
> #!/bin/bash
>
> # multalloc.sh – each line is a very simple cpp program that allocates a 8Gb 
> vector and fills it with random floats
>
> echo one
>
> ./alloc8Gb
>
> echo two
>
> ./alloc8Gb
>
> echo three
>
> ./alloc8Gb
>
> echo done.
>
> 
>
> This is submitted as follows:
>
> 
>
> sbatch --mem=1G ./multalloc.sh
>
> 
>
> The log is :
>
> one
>
> ./multalloc.sh: line 4: 231155 Killed ./alloc8Gb
>
> two
>
> ./multalloc.sh: line 6: 231181 Killed ./alloc8Gb
>
> three
>
> ./multalloc.sh: line 8: 231263 Killed ./alloc8Gb
>
> done.
>
> slurmstepd: error: Detected 3 oom-kill event(s) in StepId=3130111.batch 
> cgroup. Some of your processes may have been killed by the cgroup 
> out-of-memory handler.
>
> 
>
> I am expecting an OOM job kill right before “two”.
>
> 
>
> Any help appreciated.
>
> 
>
> Best regards,
>
> 
>
> Arthur
>
> 
>
> 
>
> -------------------------------------------------------------
>
> Dr. Arthur Gilly
>
> Head of Analytics
>
> Institute of Translational Genomics
>
> Helmholtz-Centre Munich (HMGU)
>
> -------------------------------------------------------------
>
> 
>
> Helmholtz Zentrum München 
> Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) 
> Ingolstädter Landstr. 1 
> 85764 Neuherberg 
> www.helmholtz-muenchen.de 
> <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.helmholtz-muenchen.de%2F&data=04%7C01%7Crenfro%40tntech.edu%7Cc3bc0c4af2fe4eacf38808d92af2ad25%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637588044490992273%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IrZ35zKCA2jhZY8N9r%2F2uLDxnTW1eAp5AnuiG3jq5s4%3D&reserved=0>
>   
> Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling 
> Geschäftsführung: Prof. Dr. med. Dr. h.c. Matthias Tschöp, Kerstin Günther
> Registergericht: Amtsgericht München HRB 6466 
> USt-IdNr: DE 129521671 
>
-- 
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de 
<mailto:loris.benn...@fu-berlin.de> 

Helmholtz Zentrum München 
Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) 
Ingolstädter Landstr. 1 
85764 Neuherberg 
www.helmholtz-muenchen.de 
<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.helmholtz-muenchen.de%2F&data=04%7C01%7Crenfro%40tntech.edu%7Cc3bc0c4af2fe4eacf38808d92af2ad25%7C66fecaf83dc04d2cb8b8eff0ddea46f0%7C1%7C0%7C637588044491002268%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=H5PgH5ce%2BFzn4hfH1l7Xu9OXDXg5DLfp8yImREU4klU%3D&reserved=0>

Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling 
Geschäftsführung: Prof. Dr. med. Dr. h.c. Matthias Tschöp, Kerstin Günther
Registergericht: Amtsgericht München HRB 6466 
USt-IdNr: DE 129521671 

Helmholtz Zentrum München 
Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) 
Ingolstädter Landstr. 1 
85764 Neuherberg 
www.helmholtz-muenchen.de <http://www.helmholtz-muenchen.de>  
Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling 
Geschäftsführung: Prof. Dr. med. Dr. h.c. Matthias Tschöp, Kerstin Günther
Registergericht: Amtsgericht München HRB 6466 
USt-IdNr: DE 129521671 

Helmholtz Zentrum Muenchen

Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)

Ingolstaedter Landstr. 1

85764 Neuherberg

www.helmholtz-muenchen.de

Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling

Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther

Registergericht: Amtsgericht Muenchen HRB 6466

USt-IdNr: DE 129521671

Re: [slurm-users] Kill job when child process gets OOM-killed

Reply via email to