Thanks for the prompt reply!
On Sep 27, 2011, at 6:35 AM, Salvatore Podda wrote:
We would like to know if the ethernet interfaces play any role in
the startup phase of an opempi job using InfiniBand
In this case, where we can found some literature on this topic?
Unfortunately, there's not a lot of docs about this other than
people asking questions on this list.
For the above reason, does anyone, in the list, know which the order/
ranking by which the
ethernet interfaces will be qeuried in the case of multiple ones?
And which are the rules?
Regards
Salvatore Podda
IP is used by default during Open MPI startup. Specifically, it is
used as our "out of band" communication channel for things like
stdin/stdout/stderr redirection, launch command relaying, process
control, etc. The OOB channel is also used by default for
bootstrapping IB queue pairs.
To clarify, note that these are two different things:
1. the out of band (OOB) channel used for process control, std*
routing, etc.
2. bootstrapping IB queue pairs
You can change the IB QP bootstrapping to use the OpenFabrics RDMA
communications manager (vs. our OOB channel) with the following:
mpirun --mca btl_openib_if_cpc rdmacm ...
See if that helps (although the OF RDMA CM has its own scalability
issues, also associated with ARP).
If your cluster is large, you might want to check out the section on
our FAQ about large clusters:
http://www.open-mpi.org/faq/?category=large-clusters
I don't think there's an entry on there yet about this, but it may
also be worthwhile to try enabling the "radix" support; a more
scalable version of our OOB channel (i.e., the tree across all the
support daemons has a much larger radix and is therefore much
flatter). Los Alamos recently committed an IB UD OOB channel plugin
to our development trunk and is comparing its performance to the
radix tree to see if it's worthwhile.
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/