Hi all, I am trying to compute a large 3D fft using fftw3_mpi on our cluster. It runs fine with 1024 ranks on 8 nodes.
However, when trying to run with 2048 ranks on 16 nodes, I get a lot of the following tcp errors: [btl_tcp_endpoint.c:733:mca_btl_tcp_endpoint_start_connect] bind on local address (10.0.20.18:0) failed: Address already in use (98) We tried increasing the available local ipv4 ports to the following values: cat /proc/sys/net/ipv4/ip_local_port_range 1024 65000 but it did not solve the problem. Parameters btl_tcp_port_min_v4 and btl_tcp_port_range_v4 have respective values of 1024, 64511 This is run on openmpi 4.1.0, CentOS 8. Any help greatly appreciated ! Cheers, Simon PS: let me know if more info is needed.