I hope this isn't too basic of a question, but is there a document
somewhere that describes how the selection of which BTL components (eg.
openib, tcp) to use occurs when mpirun/mpiexec is launched?  I know it
can be influenced by conf files, parameters, and env variables.  But
lacking those, how does it choose which components to use?

I'm trying to diagnose an issue involving OpenMPI, OFED, and an OS
upgrade.  I'm hoping that better understanding of how components are
selected, will help me figure out what changed with the OS upgrade.




Here's a longer explanation.

We recently upgraded our HPC cluster from RHEL 6.2 to 6.6.  We have
several versions of OpenMPI availale from a central NFS store.  Our
cluster has some nodes with IB hardware, and some without.

On the old OS image, we did not install any of the OFED components on
the non-IB nodes, and OpenMPI was able to somehow figure out that it
shouldn't even try the openib btl, without any runtime warnings.  We got
the speeds we were expecting, when running osu_bw tests from the OMB
test suite, for either the IB nodes (about 3800 MB/s for 4xQDR IB), or
the non-IB nodes (about 115 MB/s for 1GbE).

Since the OS upgrade, we start to get warnings like this on non-IB nodes
without OFED installed:

> $ mpirun -np 2 hello_world
> [m7stage-1-1:09962] mca: base: component_find: unable to open 
> /apps/openmpi/1.6.3_gnu-4.4/lib/openmpi/mca_btl_ofud: librdmacm.so.1: cannot 
> open shared object file: No such file or directory (ignored)
> [m7stage-1-1:09961] mca: base: component_find: unable to open 
> /apps/openmpi/1.6.3_gnu-4.4/lib/openmpi/mca_btl_ofud: librdmacm.so.1: cannot 
> open shared object file: No such file or directory (ignored)
> [m7stage-1-1:09961] mca: base: component_find: unable to open 
> /apps/openmpi/1.6.3_gnu-4.4/lib/openmpi/mca_btl_openib: librdmacm.so.1: 
> cannot open shared object file: No such file or directory (ignored)
> [m7stage-1-1:09962] mca: base: component_find: unable to open 
> /apps/openmpi/1.6.3_gnu-4.4/lib/openmpi/mca_btl_openib: librdmacm.so.1: 
> cannot open shared object file: No such file or directory (ignored)
> Hello from process # 0 of 2 on host m7stage-1-1
> Hello from process # 1 of 2 on host m7stage-1-1

Obviously these are references to software components associated with
OFED.  We can install OFED on the non-IB nodes, but then we get warnings
more like this:

> $ mpirun -np 2 hello_world
> librdmacm: Fatal: no RDMA devices found
> librdmacm: Fatal: no RDMA devices found
> --------------------------------------------------------------------------
> [[63448,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: OpenFabrics (openib)
>   Host: m7stage-1-1
> 
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> Hello from process # 0 of 2 on host m7stage-1-1
> Hello from process # 1 of 2 on host m7stage-1-1
> [m7stage-1-1:18753] 1 more process has sent help message 
> help-mpi-btl-base.txt / btl:no-nics
> [m7stage-1-1:18753] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
> all help / error messages

Obviously we can work with this by using "--mca btl ^openib" or similar
on the non-IB nodes.  And we're pursuing that option.

But I'm struggling to understand what happened to cause OpenMPI on the
non-IB node, without OFED installed, to no longer be able to figure out
that it shouldn't use the openib btl.  Thus the reason why I ask for
more information about how that decision is being made.  Maybe that will
clue me in, as to what changed.



Thanks,

-- 
Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

Reply via email to