Hi,
> On 9 Mar 2018, at 21:58, Nicholas McCollum <nmccol...@asc.edu> wrote: > > Connection refused makes me think a firewall issue. > > Assuming this is a test environment, could you try on the compute node: > > # iptables-save > iptables.bak > # iptables -F && iptables -X > > Then test to see if it works. To restore the firewall use: > > # iptables-restore < iptables.bak > > You may have to use... > > # systemctl stop firewalld > # systemctl start firewalld > > If you use firewalld. We’re using shorewall … There is an srun process listening on the login node: srun 8500 vsc40075 13u IPv4 597473 0t0 TCP *:36506 (LISTEN) And slurmd on the worker node is trying to connect to it [2018-03-09T22:00:44.908] [47.0] debug4: adding IO connection (logical node rank 0) [2018-03-09T22:00:44.908] [47.0] debug4: connecting IO back to 10.141.21.202:36506 [2018-03-09T22:00:44.908] [47.0] debug: _oom_event_monitor: started. [2018-03-09T22:00:44.908] [47.0] debug2: slurm_connect failed: Connection refused [2018-03-09T22:00:44.908] [47.0] debug3: Error connecting, picking new stream port [2018-03-09T22:00:44.909] [47.0] debug2: slurm_connect failed: Connection refused [2018-03-09T22:00:44.909] [47.0] debug2: slurm_connect failed: Connection refused [2018-03-09T22:00:44.909] [47.0] debug2: slurm_connect failed: Connection refused [2018-03-09T22:00:44.909] [47.0] debug2: Error connecting slurm stream socket at 10.141.21.202:36506: Connection refused [2018-03-09T22:00:44.909] [47.0] error: connect io: Connection refused Opening ports 30000-50000 seems to do the trick. Will try to figure out what’s different on the other machines. Thanks for the pointers and help! — Andy
signature.asc
Description: Message signed with OpenPGP