Re: [slurm-users] Set a ramdom offset when starting node health check in SLURM

2020-11-27 Thread Bjørn-Helge Mevik
You can also check out HealthCheckNodeState=CYCLE man slurm.conf: "Rather than running the health check program on all nodes at the same time, cycle through running on all compute nodes through the course of the HealthCheckInterval. May be combined with the various node state options." -- Chee

Re: [slurm-users] Set a ramdom offset when starting node health check in SLURM

2020-11-26 Thread Micheal Krombopulous
when starting node health check in SLURM Hi, We uses HealthCheckProgram = /usr/sbin/nhc in slurm to check node health every 600 seconds. However, some NHC checks points to a same central resource thus starting these checks simultaneously may lead to false alarms of service degrade. Is

[slurm-users] Set a ramdom offset when starting node health check in SLURM

2020-11-26 Thread SJTU
Hi, We uses HealthCheckProgram = /usr/sbin/nhc in slurm to check node health every 600 seconds. However, some NHC checks points to a same central resource thus starting these checks simultaneously may lead to false alarms of service degrade. Is it possible to set a random offset to when