To add on to what Ralph said:

1. There are two different message passing paths in OMPI:
   - "OOB" (out of band): used for control messages
   - "BTL" (byte transfer layer): used for MPI traffic
   (there are actually others, but these seem to be the relevant 2 for your 
setup)

2. If you don't specify which OOB interfaces to use OMPI will (basically) just 
pick one.  It doesn't really matter which one it uses; the OOB channel doesn't 
use too much bandwidth, and is mostly just during startup and shutdown.

The one exception to this is stdout/stderr routing.  If your MPI app writes to 
stdout/stderr, this also uses the OOB path.  So if you output a LOT to stdout, 
then the OOB interface choice might matter.

3. If you don't specify which MPI interfaces to use, OMPI will basically find 
the "best" set of interfaces and use those.  IP interfaces are always rated 
less than OS-bypass interfaces (e.g., verbs/IB).

Or, as you noticed, you can give a comma-delimited list of BTLs to use.  OMPI 
will then use -- at most -- exactly those BTLs, but definitely no others.  Each 
BTL typically has an additional parameter or parameters that can be used to 
specify which interfaces to use for the network interface type that that BTL 
uses.  For example, btl_tcp_if_include tells the TCP BTL which interface(s) to 
use.

Also, note that you seem to have missed a BTL: sm (shared memory).  sm is the 
preferred BTL to use for same-server communication.  It is much faster than 
both the TCP loopback device (which OMPI excludes by default, BTW, which is 
probably why you got reachability errors when you specifying "--mca btl 
tcp,self") and the verbs (i.e., "openib") BTL for same-server communication.

4. If you don't specify anything, OMPI usually picks the best thing for you.  
In your case, it'll probably be equivalent to:

 mpirun --mca btl openib,sm,self ...

And the control messages will flow across one of your IP interfaces.  

5. If you want to be specific about which one it uses, you can specify 
oob_tcp_if_include.  For example:

  mpirun --mca oob_tcp_if_include eth0 ...

Make sense?



On Mar 15, 2014, at 1:18 AM, Jianyu Liu <jerry_...@msn.com> wrote:

>> On Mar 14, 2014, at 10:16:34 AM,Jeff Squyres <jsquyres_at_[hidden]> wrote: 
>> 
>>> On Mar 14, 2014, at 10:11 AM, Ralph Castain <rhc_at_[hidden]> wrote: 
>>> 
>>>> 1. If specified '--mca btl tcp,self', which interface application will run 
>>>> on, use GigE adaper OR use the OpenFabrics interface in IP over IB mode 
>>>> (just like a high performance GigE adapter) ? 
>>> 
>>> Both - ip over ib looks just like an Ethernet adaptor 
>> 
>> 
>> To be clear: the TCP BTL will use all TCP interfaces (regardless of 
>> underlying physical transport). Your GigE adapter and your IP adapter both 
>> present IP interfaces to>the OS, and both support TCP. So the TCP BTL will 
>> use them, because it just sees the TCP/IP interfaces. 
> 
> Thanks for your kindly input.
> 
> Please see if I have understood correctly
> 
> Assume there are two nework
>   Gigabit Ethernet
> 
>     eth0-renamed : 192.168.[1-22].[1-14] / 255.255.192.0
> 
>   InfiniBand network
> 
>     ib0 :  172.20.[1-22].[1-4] / 255.255.0.0
> 
> 
> 1. If specified '--mca btl tcp,self
> 
>     The control information ( such as setup and teardown ) are routed to and 
> passed by Gigabit Ethernet in TCP/IP mode
>     The MPI messages are routed to and passed by InfiniBand network in IP 
> over IB mode
>     On the same machine, the TCP lookback device will be used for passing 
> control and MPI messages 
> 
> 2. If specified '--mca btl tcp,self --mca btl_tcp_if_include ib0'
> 
>     Both of control information ( such as setup and teardown ) and MPI 
> messages are routed to and passed by InfiniBand network in IP over IB mode
>     On the same machine, The TCP lookback device will be used for passing 
> control and MPI messages
> 
> 
> 3. If specified '--mca btl openib,self'
> 
>     The control information ( such as setup and teardown ) are routed to and 
> passed by InfiniBand network in IP over IB mode
>     The MPI messages are routed to and passed by InfiniBand network in RDMA 
> mode
>     On the same machine, the TCP lookback device will be used for passing 
> control and MPI messages
> 
> 
> 4. If without specifiying any 'mca btl' parameters
> 
>     The control information ( such as setup and teardown ) are routed to and 
> passed by Gigabit Ethernet in TCP/IP mode
>     The MPI messages are routed and passed by InfiniBand network in RDMA mode
>     On the same machine, the shared memory (sm) BTL will be used for control 
> and MPI passing messages
> 
> 
> Appreciating your kindly input
> 
> Jianyu                                          
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to