Hi Ole,
On 10/22/24 11:04 am, Ole Holm Nielsen via slurm-users wrote:
Some time ago it was recommended that UnkillableStepTimeout values above
127 (or 256?) should not be used, see https://support.schedmd.com/
show_bug.cgi?id=11103. I don't know if this restriction is still valid
with recent
Hi Chris,
Thanks for confirming that UnkillableStepTimeout can have larger values
without issues. Do you have some suggestions for values that would safely
cover network filesystem delays?
Best regards,
Ole
On 10/24/24 07:51, Christopher Samuel via slurm-users wrote:
Some time ago it was re
On 22-10-2024 16:46, Paul Raines via slurm-users wrote:
I have a cron job that emails me when hosts go into drain mode and
tells me the reason (scontrol show node=$host | grep -i reason)
In stead of cron you can also use Slurm triggers, see for example our
scripts in the page
https://github.c
I have a cron job that emails me when hosts go into drain mode and
tells me the reason (scontrol show node=$host | grep -i reason)
We get drains with the "Kill task failed" reason probably about 5 times a
day. This despite having UnkillableStepTimeout=180
Right now we are still handling the
On 10/21/24 4:35 am, laddaoui--- via slurm-users wrote:
It seems like there's an issue with the termination process on these nodes. Any
thoughts on what could be causing this?
That usually means processes wedged in the kernel for some reason, in an
uninterruptible sleep state. You can define
You were right, I found that the slurm.conf file was different between the
controller node and the computes, so I've synchronized it now. I was also
considering setting up an epilogue script to help debug what happens after the
job finishes. Do you happen to have any examples of what an epilogue
Your slurm.conf should be the same on all machines (is it? you don't have
Prolog configured on some but not others?), but no, it is not mandatory to use
a prolog. I am simply surprised that you could get a "Prolog error" without
having a prolog configured, since an error in the prolog program
Hi Laura,
Thank you for your reply.
Indeed, Prolog is not configured on my machine
$ scontrol show config |grep -i prolog
Prolog = (null)
PrologEpilogTimeout = 65534
PrologSlurmctld = (null)
PrologFlags = Alloc,Contain
ResvProlog = (null)
Sr
Apologies if I'm missing this in your post, but do you in fact have a Prolog
configured in your slurm.conf?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com