On 10/22/2010 07:36 AM, Scott Atchley wrote:
> Ray,
> 
> Looking back at your original message, you say that it works if you use the 
> Myricom supplied mpirun from the Myrinet roll. I wonder if this is a mismatch 
> between libraries on the compute nodes.
> 
> What do you get if you use your OMPI's mpirun with:
> 
> $ mpirun -n 1 -H <remote_host> ldd $PWD/<your_binary>
> 
> I am wondering if ldd find the libraries from your compile or the Myrinet 
> roll.
> 

OK, a bit of a hiatus trying to get this resolved.  Had to tend other
fires...

I do think I had an issue of mixed environments.   It is a Rocks 5.3
test cluster and it had an old version of OpenMPI installed as part of
the Rocks 5.3 HPC roll.  I have no removed the HPC roll. All nodes were
rebuilt.

In the previous setup, we could actually run OpenMPI jobs over MX.

With all other spurious versions of OpenMPI (and MPICH for that matter)
removed, I have rebuilt and re-installed, from a fresh source tree,
OpenMPI 1.4.3. It was built with PGI 10.8 compilers.

Now, we cannot run with MX at all.

The install was built with MX.

$ ompi_info | grep mx
                 MCA btl: mx (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA mtl: mx (MCA v2.0, API v2.0, Component v1.4.3)

I can run with TCP, but now I get

[compute-0-1.local:24863] mca: base: component_find: unable to open
/share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx: perhaps a
missing symbol, or compiled for a different version of Open MPI? (ignored)

$ ls -l /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx*
-rwxr-xr-x 1 muno muno  1070 Oct 28 12:49
/share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx.la
-rwxr-xr-x 1 muno muno 32044 Oct 28 12:49
/share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx.so

mpirun -v -nolocal -np 96 --x MX_RCACHE=2 -hostfile machines  --mca mtl
mx --mca pml cm cpi.pgi
[compute-0-3.local:21116] mca: base: component_find: unable to open
/share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx: perhaps a
missing symbol, or compiled for a different version of Open MPI? (ignored)
[compute-0-3.local:21115] mca: base: component_find: unable to open
/share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx: perhaps a
missing symbol, or compiled for a different version of Open MPI? (ignored)
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      compute-0-3.local
Framework: mtl
Component: mx
--------------------------------------------------------------------------
[compute-0-3.local:21116] mca: base: components_open: component pml / cm
open function failed
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      compute-0-3.local
Framework: mtl
Component: mx
--------------------------------------------------------------------------
[compute-0-3.local:21115] mca: base: components_open: component pml / cm
open function failed
[compute-0-3.local:21117] mca: base: component_find: unable to open
/share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx: perhaps a
missing symbol, or compiled for a different version of Open MPI? (ignored)
--------------------------------------------------------------------------



--
 Ray Muno
 University of Minnesota

Reply via email to