On Jun 22, 2018, at 7:36 PM, carlos aguni <aguni...@gmail.com> wrote:
> 
> I'm trying to run a code on 2 machines that has at least 2 network interfaces 
> in it.
> So I have them as described below:
> 
> compute01
> compute02
> ens3
> 192.168.100.104/24
> 10.0.0.227/24
> ens8
> 10.0.0.228/24
> 172.21.1.128/24
> ens9
> 172.21.1.155/24
> ---
> 
> Issue is. When I execute `mpirun -n 2 -host compute01,compute02 hostname` on 
> them what I get is the correct output after a very long delay..
> 
> What I've read so far is that OpenMPI performs a greedy algorithm on each 
> interface that timeouts if it doesn't find the desired IP.
> Then I saw here (https://www.open-mpi.org/faq/?category=tcp#tcp-selection) 
> that I can run commands like:
> `$ mpirun -n 2 --mca oob_tcp_if_include 10.0.0.0/24 -n 2 -host 
> compute01,compute02 hosname`
> But this configuration doesn't reach the other host(s).

There's actually 2 different uses of TCP in Open MPI: the MPI communications 
and the runtime communications.

In your scenario, the MPI communications should probably "just figure it out" 
(since you have 2 interfaces on the same subnets on each machine).  It can do 
this because the runtime has already established, and -- for lack of a longer 
explanation -- it can do very speedy discovery and interface matching.

But the runtime has nothing else to refer to, and it has to do its own 
discovery with no prior knowledge of anything.  This is where the timeouts come 
in.

What you described above -- setting oob_tcp_if_include to the 10.0.0.0/24 
network -- *should* work.  It's a little surprising that it does not.

Can you run with:

mpirun -np 2 --mca oob_tcp_if_include 10.0.0.0/24 --mca oob_base_verbose 100 
-host compute01,compute02 hostname

And see what it shows us?

> In the end I sometimes I get the same timeout.
> 
> So is there a way to let it to use the system's default route?

Yes and no.  The problem is that in HPC environments, the default IP route is 
not always in the same direction as the nodes on which you're trying to run 
(i.e., there's a zillion different ways to setup the IP networking, and Open 
MPI uses tend to use a lot of different ones...).

-- 
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to