On 20/1/23 3:51 am, Stefan Staeglich wrote:
But someone who is actually using a UnkillableStepProgram stated the opposite
(that it's executed on the controller nodes). Are you aware of any change
between Slurm releases? Maybe one of the two parts is just a leftover. Are you
using a UnkillableSte
Hi Chris,
thank you. I've overseen this part.
But someone who is actually using a UnkillableStepProgram stated the opposite
(that it's executed on the controller nodes). Are you aware of any change
between Slurm releases? Maybe one of the two parts is just a leftover. Are you
using a Unkillabl
On 1/19/23 5:01 am, Stefan Staeglich wrote:
Hi,
Hiya,
I'm wondering where the UnkillableStepProgram is actually executed. According
to Mike it has to be available on every on the compute nodes. This makes sense
only if it is executed there.
That's right, it's only executed on compute nodes
Hi,
I'm wondering where the UnkillableStepProgram is actually executed. According
to Mike it has to be available on every on the compute nodes. This makes sense
only if it is executed there.
But the man page slurm.conf of 21.08.x states:
UnkillableStepProgram
Must be execut
Hi Luke
Thanks for the head up
From: slurm-users On Behalf Of Luke
Yeager
Sent: Wednesday, 24 March 2021 4:58 AM
To: Slurm User Community List
Subject: Re: [slurm-users] Slurm - UnkillableStepProgram
While you're looking at this, make sure you don't set UnkillableStepTimeout t
Hi Chris
Thanks for the clarification
Mike
-Original Message-
From: slurm-users On Behalf Of Chris
Samuel
Sent: Tuesday, 23 March 2021 5:30 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Slurm - UnkillableStepProgram
Hi Mike,
On 22/3/21 7:12 pm, Yap, Mike wrote
While you're looking at this, make sure you don't set UnkillableStepTimeout to
a value larger than 126 seconds:
https://bugs.schedmd.com/show_bug.cgi?id=11103
From: slurm-users On Behalf Of Yap, Mike
Sent: Monday, March 22, 2021 7:13 PM
To: slurm-users@lists.schedmd.com
Subject: [s
Hi Mike,
On 22/3/21 7:12 pm, Yap, Mike wrote:
# I presume UnkillableStepTimeout is set in slurm.conf. and it act as a
timer to trigger UnkillableStepProgram
That is correct.
# UnkillableStepProgram  can be use to send email or reboot compute node
– question is how do we configure it ?
Al
Hi All
Have been reading on the archive hoping to implement unkillablesteptimeout and
unkillablesteprogram to the slurm
But I'm kind of confuse with it application
1. I presume UnkillableStepTimeout is set in slurm.conf. and it act as a
timer to trigger UnkillableStepProgram
2. Unkillab