On Jan 25, 2010, at 11:58 AM, Mathieu Gontier wrote:

> I built OpenMPI-1.4.1 without openib support with the following configuration 
> options:
> 
> ./configure 
> --prefix=/develop/libs/OpenMPI/openmpi-1.4.1/LINUX_GCC_4_1_tcp_mach 
> --enable-static --enable-shared --enable-cxx-exceptions --enable-mpi-f77 
> --disable-mpi-f90 --enable-mpi-cxx --disable-mpi-cxx-seek --enable-dist 
> --enable-mpi-profile --enable-binaries --enable-mpi-threads 
> --enable-memchecker --disable-debug --with-pic --with-threads   --with-sge

Note that you should not use --enable-dist.  --enable-dist is used by the OMPI 
maintainers ONLY when generating official downloadable tarballs.  It is *NOT* 
guaranteed to make sane / correct builds for general purpose runs.  Here's what 
./configure --help says about --enable-dist:

  --enable-dist           guarantee that that the "dist" make target will be
                          functional, although may not guarantee that any
                          other make target will be functional.

Specifically: --enable-dist allows some configure tests to "pass" even though 
they shouldn't.  For example, I don't have MX installed on my systems.  But 
with --enable-dist, the MX tests in OMPI's configure script will "pass" just 
enough so that I can "make dist" to generate a tarball and still include all 
the MX plugin source code.  

> On my cluster, I run a small test (a broadcast on a 100 integer array) on 12 
> processes balanced on 3 nodes, but I asked for using openib. It works with 
> the following messages:
> 
> mpirun -np 12 -hostfile /tmp/72936.1.64.q/machines --mca btl openib,sm,self 
> /home/numeca/tmp/gontier/bcast/exe_ompi_cluster -nloop 2 -nbuff 100

Is your PATH and LD_LIBRARY_PATH set correctly such that you'll find the 
"right" ones (i.e., the ones that you just built/installed in 
/develop/libs/OpenMPI/openmpi-1.4.1/LINUX_GCC_4_1_tcp_mach)?  I.e., is it 
possible that you're finding some other OMPI install that has OpenFabrics 
support?

Further, did you ever previously install Open MPI into that prefix and include 
OpenFabrics support?  I ask because OMPI's OpenFabrics support is in the form 
of a plugin -- if you simply installed another copy of OMPI into the same 
prefix without uninstalling first, the OpenFabrics plugin could still have been 
left in the tree, and therefore used at run time.

Finally, note that you didn't tell Open MPI to *NOT* build OpenFabrics support. 
 In this case, OMPI's configure script looks for OpenFabrics support, and if it 
finds it, builds it.  But if it doesn't find OpenFabrics support (and you 
didn't specifically ask for it), it just skips it and keeps going.  You might 
want to look through the output of OMPI's configure and see if it found 
OpenFabrics support and therefore decided to build it.

> I finally run ompi_info:
> 
> ./ompi_info | grep openib
>                  MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.1)
> 
> Openib seems to be supported. That is weird because I did not ask for...

Yep; see above.

> So, assuming the compilation of OpenMPI which does not support openib here, 
> what happened? Was tcp selected? How can I check which device has been used 
> (or force an explicit message)?

Unfortunately, OMPI currently lacks a good message indicating which device is 
used at run-time (because it's actually a surprisingly complex issue, since 
OMPI chooses a communication device based on which peer it's talking to, among 
other reasons).  We hope to have a good message in sometime in the OMPI 1.5 
series.

> By the way, what is the meaning of this message in my case?

Do you mean this message?

-----
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   node005
  Local device: mthca0
-----

If so, it means that Open MPI was unable to initialize the InfiniBand HCA known 
as "mthca0" on the server known as node005.  

The RLIMIT messages are likely symptoms of the issue; you likely need to set 
your registered memory limits to "unlimited".  See the OMPI FAQ in the 
OpenFabrics section for questions about registered memory limits for 
instructions how.

> By the way, another different think: does OpenMPI must be compiled with 
> gcc-4.1 or later, or gcc-3.4 (for example) can be used? 

gcc 3.4 should be fine.

-- 
Jeff Squyres
jsquy...@cisco.com


Reply via email to