On Thu, Nov 01, 2007 at 07:41:33PM -0400, George Bosilca wrote:
> There are two things that are reflected in your email.
> 
> 1. You can run Open MPI (or at least ompi_info) on the head node, and  
> udapl is in the list of BTL. This means the head node has all  
> libraries required to load udapl, and your LD_LIBRARY_PATH is  
> correctly configured on the head node.
> 
> 2. When running between vic12-10g and vic20-10g udapl cannot or refuse  
> to be loaded. This can means 2 things: some of the shared libraries  
> are missing or not in the LD_LIBRARY_PATH or once initialized udapl  
> detect that the connection to the remote node is impossible.
> 
> The next thing to do is to test that your LD_LIBRARY_PATH is correctly  
> (which means it contain not only the path to the Open MPI libraries  
> but the path to the udapl libraries) set for non-interactive shells on  
> each node in the cluster. A "ssh vic12-10g printenv | grep  
> LD_LIBRARY_PATH" should give you the answer.

Thanks for the help.  Per your request, I get the following:
# ssh vic12-10g printenv | grep LD
LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-1.2-svn/lib64:

That directory contains the btl udapl libraries, as you said.
# ls -R /usr/mpi/gcc/openmpi-1.2-svn/lib64/ | grep dapl
mca_btl_udapl.la
mca_btl_udapl.so

A search on the system shows libdaplcma and libdat in /usr/lib/.  For
giggles, I added /usr/lib to the env, but the programs still fails to
run with the same error.

I believe I have the correct rpms installed for the libs.  Here is what
I have on the systems.
# rpm -qa | grep dapl
dapl-devel-1.2.1-0
dapl-1.2.1-0
dapl-utils-1.2.1-0

What should I be looking to link against?

Thanks,
Jon

> 
>   Thanks,
>     georg.e
> 
> On Nov 1, 2007, at 6:52 PM, Jon Mason wrote:
> 
> >On Wed, Oct 31, 2007 at 06:45:10PM -0400, Tim Prins wrote:
> >>Hi Jon,
> >>
> >>Just to make sure, running 'ompi_info' shows that you have the  
> >>udapl btl
> >>installed?
> >
> >Yes, I get the following:
> ># ompi_info | grep dapl
> >                MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.5)
> >
> >If I do not include "self" in the mca, then I get an error saying it
> >cannot find the btl component:
> >
> ># mpirun --n 2 --host vic12-10g,vic20-10g -mca btl udapl /usr/mpi/ 
> >gcc/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1 pingpong
> >--------------------------------------------------------------------------
> >No available btl components were found!
> >
> >This means that there are no components of this type installed on your
> >system or all the components reported that they could not be used.
> >
> >This is a fatal error; your MPI process is likely to abort.  Check the
> >output of the "ompi_info" command and ensure that components of this
> >type are available on your system.  You may also wish to check the
> >value of the "component_path" MCA parameter and ensure that it has at
> >least one directory that contains valid MCA components.
> >
> >--------------------------------------------------------------------------
> >mpirun noticed that job rank 1 with PID 4335 on node vic20-10g  
> >exited on
> >signal 15 (Terminated).
> >
> ># ompi_info --all | grep component_path
> >                MCA mca: parameter "mca_component_path" (current  
> >value: "/usr/mpi/gcc/openmpi-1.2-svn/lib/openmpi:/root/.openmpi/ 
> >components")
> >
> ># ls /usr/mpi/gcc/openmpi-1.2-svn/lib/openmpi | grep dapl
> >mca_btl_udapl.la
> >mca_btl_udapl.so
> >
> >So it looks to me like it should be finding it, but perhaps I am  
> >lacking
> >something in my configuration.  Any ideas?
> >
> >Thanks,
> >Jon
> >
> >
> >>
> >>Tim
> >>
> >>On Wednesday 31 October 2007 06:11:39 pm Jon Mason wrote:
> >>>I am having a bit of a problem getting udapl to work via mpirun  
> >>>(over
> >>>open-mpi, obviously).  I am running a basic pingpong test and I  
> >>>get the
> >>>following error.
> >>>
> >>># mpirun --n 2 --host vic12-10g,vic20-10g -mca btl udapl,self
> >>>/usr/mpi/gcc/open*/tests/IMB*/IMB-MPI1 pingpong
> >>>--------------------------------------------------------------------------
> >>>Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> >>>If you specified the use of a BTL component, you may have
> >>>forgotten a component (such as "self") in the list of
> >>>usable components.
> >>>--------------------------------------------------------------------------
> >>>--------------------------------------------------------------------------
> >>>It looks like MPI_INIT failed for some reason; your parallel  
> >>>process is
> >>>likely to abort.  There are many reasons that a parallel process can
> >>>fail during MPI_INIT; some of which are due to configuration or
> >>>environment
> >>>problems.  This failure appears to be an internal failure; here's  
> >>>some
> >>>additional information (which may only be relevant to an Open MPI
> >>>developer):
> >>>
> >>> PML add procs failed
> >>> --> Returned "Unreachable" (-12) instead of "Success" (0)
> >>>--------------------------------------------------------------------------
> >>>*** An error occurred in MPI_Init
> >>>*** before MPI was initialized
> >>>*** MPI_ERRORS_ARE_FATAL (goodbye)
> >>>--------------------------------------------------------------------------
> >>>Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> >>>If you specified the use of a BTL component, you may have
> >>>forgotten a component (such as "self") in the list of
> >>>usable components.
> >>>--------------------------------------------------------------------------
> >>>--------------------------------------------------------------------------
> >>>It looks like MPI_INIT failed for some reason; your parallel  
> >>>process is
> >>>likely to abort.  There are many reasons that a parallel process can
> >>>fail during MPI_INIT; some of which are due to configuration or
> >>>environment
> >>>problems.  This failure appears to be an internal failure; here's  
> >>>some
> >>>additional information (which may only be relevant to an Open MPI
> >>>developer):
> >>>
> >>> PML add procs failed
> >>> --> Returned "Unreachable" (-12) instead of "Success" (0)
> >>>--------------------------------------------------------------------------
> >>>*** An error occurred in MPI_Init
> >>>*** before MPI was initialized
> >>>*** MPI_ERRORS_ARE_FATAL (goodbye)
> >>>
> >>>
> >>>
> >>>The command is successful if udapl is replaced with tcp or  
> >>>openib.  So I
> >>>think my setup is correct.  Also, dapltest successfully completes
> >>>without any problems over IB or iWARP.
> >>>
> >>>Any thoughts or suggestions would be greatly appreciated.
> >>>
> >>>Thanks,
> >>>Jon
> >>>
> >>>_______________________________________________
> >>>users mailing list
> >>>us...@open-mpi.org
> >>>http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >>_______________________________________________
> >>users mailing list
> >>us...@open-mpi.org
> >>http://www.open-mpi.org/mailman/listinfo.cgi/users
> >_______________________________________________
> >users mailing list
> >us...@open-mpi.org
> >http://www.open-mpi.org/mailman/listinfo.cgi/users
> 



> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to