Andy

I appreciate you making me check again, things do get missed

SELinux is off, firewalld is disabled


[root@SRVGRIDSLURM01 ~]# sestatus

SELinux status:                 disabled

[root@SRVGRIDSLURM01 ~]# systemctl status firewalld

● firewalld.service - firewalld - dynamic firewall daemon

   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor 
preset: enabled)

   Active: inactive (dead)

     Docs: man:firewalld(1)

The one thing I can think of is that the system running  slurmctld has two 
network interfaces. It serves as a gateway, so has two network address. The two 
of the test slurmd's are on the other side of that gateway box, one is on the 
same box. But the two on the other side of the gateway, have a different IP 
address range and possibly mask

this is from slurm.conf for the three nodes. I know they are talking; I can see 
it in the logs when set to a debug logging level
the nodename info comes from slurmd -C, so that is correct
added the IP address, but that did not matter


# COMPUTE NODES

NodeName=SRVGRIDSLURM01 NodeAddr=192.168.1.60 CPUs=4 Boards=1 SocketsPerBoard=1 
CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7821

NodeName=SRVGRIDSLURM02 NodeAddr=192.168.1.61 CPUs=4 Boards=1 SocketsPerBoard=1 
CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7821

NodeName=srvgridslurm03 NodeAddr=192.168.1.62 CPUs=4 Boards=1 SocketsPerBoard=1 
CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7821

PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP

about the only thing I can think of is to make one of the nodes on the 
otherside of the gateway into the control node



Steve Bland
Technical Product Manager

Third Party Products
Ross Video | Production Technology Experts
T: +1 (613) 228-0688 ext.4219
www.rossvideo.com<http://www.rossvideo.com/>

________________________________
From: Andy Riebs <andy.ri...@gmail.com> on behalf of Andy Riebs 
<a...@candooz.com>
Sent: 26 November 2020 13:40
To: Steve Bland <sbl...@rossvideo.com>; Slurm User Community List 
<slurm-users@lists.schedmd.com>
Subject: Re: [EXTERNAL] Re: [slurm-users] trying to diagnose a connectivity 
issue between the slurmctld process and the slurmd nodes


One last shot on the firewall front Steve -- does the control node have a 
firewall enabled? I've seen cases where that can cause the sporadic messaging 
failures that you seem to be seeing.

That failing, I'll defer to anyone with different ideas!

Andy

On 11/26/2020 1:01 PM, Steve Bland wrote:
----------------------------------------------

This e-mail and any attachments may contain information that is confidential to 
Ross Video.

If you are not the intended recipient, please notify me immediately by replying 
to this message. Please also delete all copies. Thank you.

Reply via email to