This is a follow-up to an earlier question, I'm trying to understand how --mca 
btl prioritizes it's choice for connectivity.  Going back to my original 
network, there are actually two networks running around.  A point to point 
Infiniband network that looks like this (with two fabrics):

A(port 1)(opensm)------>B
A(port 2)(opensm)------>C

The original question queried whether there was a way to avoid the problem of B 
and C not being able to talk to each other if I were to run

mpirun  -host A,B,C --mca btl openib,self -d /mnt/shared/apps/myapp

"At least one pair of MPI processes are unable to reach each other for
MPI communications." ...

There is an additional network though, I have an ethernet management network 
that connects to all nodes.  If our program retrieves the ranks from the nodes 
using TCP and then can shift to openib, that would be interesting and, in fact, 
if I run 

mpirun  -host A,B,C --mca btl openib,tcp,self -d /mnt/shared/apps/myapp

The program does, in fact, run cleanly.

But, the question I have now is does MPI "choose" to use tcp when it can find 
all nodes and then always use tcp, or will it fall back to openib if it can?

So ... more succinctly:
Given a list of btls, such as openib,tcp,self, and a program can only broadcast 
on tcp but individual operations can occur over openib between nodes, will 
mpirun use the first interconnect that works for each operation or once it 
finds one that the broadcast phase works on will it use that one permanently?

And, as a follow-up, can I turn off the attempt to broadcast to touch all nodes?

Paul Monday

Reply via email to