Re: [slurm-users] Which ports does slurm use?

Ole Holm Nielsen Fri, 07 Feb 2020 13:36:01 -0800

On 06-02-2020 22:40, Dean Schulze wrote:

I've moved two nodes to a different controller. The nodes are wired andthe controller is networked via wifi. I had to open up ports 6817 and6818 between the wired and wireless sides of our network to get anyconnectivity.
Now when I do

srun -N2 hostname

the jobs show connection timeouts on the nodes:
[2020-02-06T14:24:37.183] launch task 60.0 request from UID:1000GID:1000 HOST:10.204.18.232 PORT:19602[2020-02-06T14:24:37.183] lllp_distribution jobid [60] implicit autobinding: cores, dist 8192
[2020-02-06T14:24:37.183] _task_layout_lllp_cyclic
[2020-02-06T14:24:37.183] _lllp_generate_cpu_bind jobid [60]: mask_cpu,0x0101
[2020-02-06T14:24:37.184] _run_prolog: run job script took usec=6
[2020-02-06T14:24:37.184] _run_prolog: prolog with lock for job 60 ranfor 0 seconds
[2020-02-06T14:24:45.224] [60.0] error: connect io: Connection timed out
[2020-02-06T14:24:45.224] [60.0] error: IO setup failed: Connectiontimed out[2020-02-06T14:24:45.225] [60.0] error: job_manager exiting abnormally,rc = 4021[2020-02-06T14:24:59.538] [60.0] error: _send_launch_resp: Failed tosend RESPONSE_LAUNCH_TASKS: Connection timed out
[2020-02-06T14:24:59.551] [60.0] done with job
That node used port 19602 and the other node was using port 12496. WhenI did the srun again the jobs showed two different ports on the nodes(58040 and 32392).
How can I configure a network if srun is going to grab different portseach time?


Perhaps the information about firewall setup in my Wiki page can be of use:
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons

/Ole

Re: [slurm-users] Which ports does slurm use?

Reply via email to