We'd bumped ours up for a while 20+ years ago when we had a flaky network connection between two buildings holding our compute nodes. If you need more than 600s you have networking problems.
On Mon, Feb 12, 2024 at 5:41 PM Timony, Mick via slurm-users < slurm-users@lists.schedmd.com> wrote: > We set SlurmdTimeout=600. The docs say not to go any higher than 65533 > seconds: > > https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdTimeout > > The FAQ has info about SlurmdTimeout also. The worst thing that could > happen is will take longer to set nodes as being down: > >A node is set DOWN when the slurmd daemon on it stops responding for > SlurmdTimeout as defined in slurm.conf. > > https://slurm.schedmd.com/faq.html > > I wouldn't set it too high, but too high vs too low will vary from site to > site and how busy your controllers are and how busy your network is. > > Regards > --Mick > ------------------------------ > *From:* Bjørn-Helge Mevik via slurm-users <slurm-users@lists.schedmd.com> > *Sent:* Monday, February 12, 2024 7:16 AM > *To:* slurm-us...@schedmd.com <slurm-us...@schedmd.com> > *Subject:* [slurm-users] Re: Increasing SlurmdTimeout beyond 300 Seconds > > We've been running one cluster with SlurmdTimeout = 1200 sec for a > couple of years now, and I haven't seen any problems due to that. > > -- > Regards, > Bjørn-Helge Mevik, dr. scient, > Department for Research Computing, University of Oslo > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com