On 26/11/20 9:21 am, Steve Bland wrote:
Sinfo always returns nodes not responding
One thing - do the nodes return to this state when you resume them with
"scontrol update node=srvgridslurm[01-03] state=resume" ?
If they do then what does your slurmctld logs say for the reason for this?
You
Steve, you've exhausted my best ideas... hope someone else can jump in!
Andy
On Fri, Nov 27, 2020, 11:19 AM Steve Bland wrote:
>
> Andy
>
> I appreciate you making me check again, things do get missed
>
> SELinux is off, firewalld is disabled
>
> [root@SRVGRIDSLURM01 ~]# sestatus
>
> SELinux st
Andy
I appreciate you making me check again, things do get missed
SELinux is off, firewalld is disabled
[root@SRVGRIDSLURM01 ~]# sestatus
SELinux status: disabled
[root@SRVGRIDSLURM01 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
You can also check out
HealthCheckNodeState=CYCLE
man slurm.conf:
"Rather than running the health check program on all nodes at the same
time, cycle through running on all compute nodes through the course of
the HealthCheckInterval. May be combined with the various node state
options."
--
Chee