The firewalls are disabled on all nodes on my cluster so I don't think it is a firewall issue. It's probably our network security between the wired part of our network and the wireless side. When I put the nodes back on a wired controller they work again.
-----Original Message----- From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Ole Holm Nielsen Sent: Friday, February 7, 2020 2:34 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Which ports does slurm use? On 06-02-2020 22:40, Dean Schulze wrote: > I've moved two nodes to a different controller. The nodes are wired > and the controller is networked via wifi. I had to open up ports 6817 > and > 6818 between the wired and wireless sides of our network to get any > connectivity. > > Now when I do > > srun -N2 hostname > > the jobs show connection timeouts on the nodes: > > [2020-02-06T14:24:37.183] launch task 60.0 request from UID:1000 > GID:1000 HOST:10.204.18.232 PORT:19602 [2020-02-06T14:24:37.183] > lllp_distribution jobid [60] implicit auto > binding: cores, dist 8192 > [2020-02-06T14:24:37.183] _task_layout_lllp_cyclic > [2020-02-06T14:24:37.183] _lllp_generate_cpu_bind jobid [60]: > mask_cpu, > 0x0101 > [2020-02-06T14:24:37.184] _run_prolog: run job script took usec=6 > [2020-02-06T14:24:37.184] _run_prolog: prolog with lock for job 60 ran > for 0 seconds [2020-02-06T14:24:45.224] [60.0] error: connect io: > Connection timed out [2020-02-06T14:24:45.224] [60.0] error: IO setup > failed: Connection timed out [2020-02-06T14:24:45.225] [60.0] error: > job_manager exiting abnormally, rc = 4021 [2020-02-06T14:24:59.538] > [60.0] error: _send_launch_resp: Failed to send RESPONSE_LAUNCH_TASKS: > Connection timed out [2020-02-06T14:24:59.551] [60.0] done with job > > That node used port 19602 and the other node was using port 12496. > When I did the srun again the jobs showed two different ports on the > nodes > (58040 and 32392). > > How can I configure a network if srun is going to grab different ports > each time? Perhaps the information about firewall setup in my Wiki page can be of use: https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons /Ole