We set SlurmdTimeout=600​. The docs say not to go any higher than 65533 seconds:

https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdTimeout

The FAQ has info about SlurmdTimeout also. The worst thing that could happen is 
will take longer to set nodes as being down:
>A node is set DOWN when the slurmd daemon on it stops responding for 
>SlurmdTimeout as defined in slurm.conf.

https://slurm.schedmd.com/faq.html

I wouldn't set it too high, but too high vs too low will vary from site to site 
and how busy your controllers are and how busy your network is.

​Regards
--Mick
________________________________
From: Bjørn-Helge Mevik via slurm-users <slurm-users@lists.schedmd.com>
Sent: Monday, February 12, 2024 7:16 AM
To: slurm-us...@schedmd.com <slurm-us...@schedmd.com>
Subject: [slurm-users] Re: Increasing SlurmdTimeout beyond 300 Seconds

We've been running one cluster with SlurmdTimeout = 1200 sec for a
couple of years now, and I haven't seen any problems due to that.

--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to