Thank you explicitly setting the interface like below has resolved this. Thanks,
Dean > On 28 Nov 2020, at 10:27, Gilles Gouaillardet via users > <users@lists.open-mpi.org> wrote: > > Dean, > > That typically occurs when some nodes have multiple interfaces, and > several nodes have a similar IP on a private/unused interface. > > I suggest you explicitly restrict the interface Open MPI should be using. > For example, you can > > mpirun --mca btl_tcp_if_include eth0 ... > > Cheers, > > Gilles > > On Fri, Nov 27, 2020 at 7:36 PM CHESTER, DEAN (PGR) via users > <users@lists.open-mpi.org> wrote: >> >> Hi, >> >> I am trying to set up some machines with OpenMPI connected with ethernet to >> expand some batch system we already have in use. >> >> This is controlled with Slurm already and we are able to get a basic MPI >> program running across 2 of the machines but when I compile and something >> that actually performs communication it fails. >> >> Slurm was not configured with PMI/PMI2 so we require running with mpirun for >> program execution. >> >> OpenMPI is installed on my home space which is accessible on all of the >> nodes we are trying to run on. >> >> My hello world application gets the world size, rank and hostname and prints >> this. This successfully launches and runs. >> >> Hello world from processor viper-03, rank 0 out of 8 processors >> Hello world from processor viper-03, rank 1 out of 8 processors >> Hello world from processor viper-03, rank 2 out of 8 processors >> Hello world from processor viper-03, rank 3 out of 8 processors >> Hello world from processor viper-04, rank 4 out of 8 processors >> Hello world from processor viper-04, rank 5 out of 8 processors >> Hello world from processor viper-04, rank 6 out of 8 processors >> Hello world from processor viper-04, rank 7 out of 8 processors >> >> I then tried to run the OSU micro-benchmarks but these fail to run. I get >> the following output: >> >> # OSU MPI Latency Test v5.6.3 >> # Size Latency (us) >> [viper-01:25885] [[21336,0],0] ORTE_ERROR_LOG: Data unpack would read past >> end of buffer in file util/show_help.c at line 507 >> -------------------------------------------------------------------------- >> WARNING: Open MPI accepted a TCP connection from what appears to be a >> another Open MPI process but cannot find a corresponding process >> entry for that peer. >> >> This attempted connection will be ignored; your MPI job may or may not >> continue properly. >> >> Local host: viper-02 >> PID: 20406 >> ————————————————————————————————————— >> >> The machines are firewall yet the ports 9000-9060 are open. I have set the >> following MCA parameters to match the open ports: >> >> btl_tcp_port_min_v4=9000 >> btl_tcp_port_range_v4=60 >> oob_tcp_dynamic_ipv4_ports=9020 >> >> OpenMPI 4.0.5 was built with GCC 4.8.5 and only the installation prefix was >> set to $HOME/local/ompi. >> >> What else could be going wrong? >> >> Kind Regards, >> >> Dean