On Sep 27, 2011, at 6:35 AM, Salvatore Podda wrote:

>       We would like to know if the ethernet interfaces play any role in the 
> startup phase of an opempi job using InfiniBand
> In this case, where we can found some literature on this topic?

Unfortunately, there's not a lot of docs about this other than people asking 
questions on this list.

IP is used by default during Open MPI startup.  Specifically, it is used as our 
"out of band" communication channel for things like stdin/stdout/stderr 
redirection, launch command relaying, process control, etc.  The OOB channel is 
also used by default for bootstrapping IB queue pairs.

To clarify, note that these are two different things:

1. the out of band (OOB) channel used for process control, std* routing, etc.
2. bootstrapping IB queue pairs

You can change the IB QP bootstrapping to use the OpenFabrics RDMA 
communications manager (vs. our OOB channel) with the following:

    mpirun --mca btl_openib_if_cpc rdmacm ...

See if that helps (although the OF RDMA CM has its own scalability issues, also 
associated with ARP).

If your cluster is large, you might want to check out the section on our FAQ 
about large clusters:

    http://www.open-mpi.org/faq/?category=large-clusters

I don't think there's an entry on there yet about this, but it may also be 
worthwhile to try enabling the "radix" support; a more scalable version of our 
OOB channel (i.e., the tree across all the support daemons has a much larger 
radix and is therefore much flatter).  Los Alamos recently committed an IB UD 
OOB channel plugin to our development trunk and is comparing its performance to 
the radix tree to see if it's worthwhile.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to