On Jun 22, 2018, at 7:36 PM, carlos aguni <aguni...@gmail.com> wrote: > > I'm trying to run a code on 2 machines that has at least 2 network interfaces > in it. > So I have them as described below: > > compute01 > compute02 > ens3 > 192.168.100.104/24 > 10.0.0.227/24 > ens8 > 10.0.0.228/24 > 172.21.1.128/24 > ens9 > 172.21.1.155/24 > --- > > Issue is. When I execute `mpirun -n 2 -host compute01,compute02 hostname` on > them what I get is the correct output after a very long delay.. > > What I've read so far is that OpenMPI performs a greedy algorithm on each > interface that timeouts if it doesn't find the desired IP. > Then I saw here (https://www.open-mpi.org/faq/?category=tcp#tcp-selection) > that I can run commands like: > `$ mpirun -n 2 --mca oob_tcp_if_include 10.0.0.0/24 -n 2 -host > compute01,compute02 hosname` > But this configuration doesn't reach the other host(s).
There's actually 2 different uses of TCP in Open MPI: the MPI communications and the runtime communications. In your scenario, the MPI communications should probably "just figure it out" (since you have 2 interfaces on the same subnets on each machine). It can do this because the runtime has already established, and -- for lack of a longer explanation -- it can do very speedy discovery and interface matching. But the runtime has nothing else to refer to, and it has to do its own discovery with no prior knowledge of anything. This is where the timeouts come in. What you described above -- setting oob_tcp_if_include to the 10.0.0.0/24 network -- *should* work. It's a little surprising that it does not. Can you run with: mpirun -np 2 --mca oob_tcp_if_include 10.0.0.0/24 --mca oob_base_verbose 100 -host compute01,compute02 hostname And see what it shows us? > In the end I sometimes I get the same timeout. > > So is there a way to let it to use the system's default route? Yes and no. The problem is that in HPC environments, the default IP route is not always in the same direction as the nodes on which you're trying to run (i.e., there's a zillion different ways to setup the IP networking, and Open MPI uses tend to use a lot of different ones...). -- Jeff Squyres jsquy...@cisco.com _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users