Hi Luke Thanks for the head up
From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Luke Yeager Sent: Wednesday, 24 March 2021 4:58 AM To: Slurm User Community List <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] Slurm - UnkillableStepProgram While you're looking at this, make sure you don't set UnkillableStepTimeout to a value larger than 126 seconds: https://bugs.schedmd.com/show_bug.cgi?id=11103 From: slurm-users <slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>> On Behalf Of Yap, Mike Sent: Monday, March 22, 2021 7:13 PM To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com> Subject: [slurm-users] Slurm - UnkillableStepProgram External email: Use caution opening links or attachments Hi All Have been reading on the archive hoping to implement unkillablesteptimeout and unkillablesteprogram to the slurm But I'm kind of confuse with it application 1. I presume UnkillableStepTimeout is set in slurm.conf. and it act as a timer to trigger UnkillableStepProgram 2. UnkillableStepProgram can be use to send email or reboot compute node - question is how do we configure it ? scontrol show config | grep -i kill KillOnBadExit = 1 KillWait = 30 sec UnkillableStepProgram = (null) UnkillableStepTimeout = 300 sec Please advise Thanks Mike