On Thu, Nov 01, 2007 at 07:41:33PM -0400, George Bosilca wrote: > There are two things that are reflected in your email. > > 1. You can run Open MPI (or at least ompi_info) on the head node, and > udapl is in the list of BTL. This means the head node has all > libraries required to load udapl, and your LD_LIBRARY_PATH is > correctly configured on the head node. > > 2. When running between vic12-10g and vic20-10g udapl cannot or refuse > to be loaded. This can means 2 things: some of the shared libraries > are missing or not in the LD_LIBRARY_PATH or once initialized udapl > detect that the connection to the remote node is impossible. > > The next thing to do is to test that your LD_LIBRARY_PATH is correctly > (which means it contain not only the path to the Open MPI libraries > but the path to the udapl libraries) set for non-interactive shells on > each node in the cluster. A "ssh vic12-10g printenv | grep > LD_LIBRARY_PATH" should give you the answer.
Thanks for the help. Per your request, I get the following: # ssh vic12-10g printenv | grep LD LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-1.2-svn/lib64: That directory contains the btl udapl libraries, as you said. # ls -R /usr/mpi/gcc/openmpi-1.2-svn/lib64/ | grep dapl mca_btl_udapl.la mca_btl_udapl.so A search on the system shows libdaplcma and libdat in /usr/lib/. For giggles, I added /usr/lib to the env, but the programs still fails to run with the same error. I believe I have the correct rpms installed for the libs. Here is what I have on the systems. # rpm -qa | grep dapl dapl-devel-1.2.1-0 dapl-1.2.1-0 dapl-utils-1.2.1-0 What should I be looking to link against? Thanks, Jon > > Thanks, > georg.e > > On Nov 1, 2007, at 6:52 PM, Jon Mason wrote: > > >On Wed, Oct 31, 2007 at 06:45:10PM -0400, Tim Prins wrote: > >>Hi Jon, > >> > >>Just to make sure, running 'ompi_info' shows that you have the > >>udapl btl > >>installed? > > > >Yes, I get the following: > ># ompi_info | grep dapl > > MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.5) > > > >If I do not include "self" in the mca, then I get an error saying it > >cannot find the btl component: > > > ># mpirun --n 2 --host vic12-10g,vic20-10g -mca btl udapl /usr/mpi/ > >gcc/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1 pingpong > >-------------------------------------------------------------------------- > >No available btl components were found! > > > >This means that there are no components of this type installed on your > >system or all the components reported that they could not be used. > > > >This is a fatal error; your MPI process is likely to abort. Check the > >output of the "ompi_info" command and ensure that components of this > >type are available on your system. You may also wish to check the > >value of the "component_path" MCA parameter and ensure that it has at > >least one directory that contains valid MCA components. > > > >-------------------------------------------------------------------------- > >mpirun noticed that job rank 1 with PID 4335 on node vic20-10g > >exited on > >signal 15 (Terminated). > > > ># ompi_info --all | grep component_path > > MCA mca: parameter "mca_component_path" (current > >value: "/usr/mpi/gcc/openmpi-1.2-svn/lib/openmpi:/root/.openmpi/ > >components") > > > ># ls /usr/mpi/gcc/openmpi-1.2-svn/lib/openmpi | grep dapl > >mca_btl_udapl.la > >mca_btl_udapl.so > > > >So it looks to me like it should be finding it, but perhaps I am > >lacking > >something in my configuration. Any ideas? > > > >Thanks, > >Jon > > > > > >> > >>Tim > >> > >>On Wednesday 31 October 2007 06:11:39 pm Jon Mason wrote: > >>>I am having a bit of a problem getting udapl to work via mpirun > >>>(over > >>>open-mpi, obviously). I am running a basic pingpong test and I > >>>get the > >>>following error. > >>> > >>># mpirun --n 2 --host vic12-10g,vic20-10g -mca btl udapl,self > >>>/usr/mpi/gcc/open*/tests/IMB*/IMB-MPI1 pingpong > >>>-------------------------------------------------------------------------- > >>>Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > >>>If you specified the use of a BTL component, you may have > >>>forgotten a component (such as "self") in the list of > >>>usable components. > >>>-------------------------------------------------------------------------- > >>>-------------------------------------------------------------------------- > >>>It looks like MPI_INIT failed for some reason; your parallel > >>>process is > >>>likely to abort. There are many reasons that a parallel process can > >>>fail during MPI_INIT; some of which are due to configuration or > >>>environment > >>>problems. This failure appears to be an internal failure; here's > >>>some > >>>additional information (which may only be relevant to an Open MPI > >>>developer): > >>> > >>> PML add procs failed > >>> --> Returned "Unreachable" (-12) instead of "Success" (0) > >>>-------------------------------------------------------------------------- > >>>*** An error occurred in MPI_Init > >>>*** before MPI was initialized > >>>*** MPI_ERRORS_ARE_FATAL (goodbye) > >>>-------------------------------------------------------------------------- > >>>Process 0.1.0 is unable to reach 0.1.1 for MPI communication. > >>>If you specified the use of a BTL component, you may have > >>>forgotten a component (such as "self") in the list of > >>>usable components. > >>>-------------------------------------------------------------------------- > >>>-------------------------------------------------------------------------- > >>>It looks like MPI_INIT failed for some reason; your parallel > >>>process is > >>>likely to abort. There are many reasons that a parallel process can > >>>fail during MPI_INIT; some of which are due to configuration or > >>>environment > >>>problems. This failure appears to be an internal failure; here's > >>>some > >>>additional information (which may only be relevant to an Open MPI > >>>developer): > >>> > >>> PML add procs failed > >>> --> Returned "Unreachable" (-12) instead of "Success" (0) > >>>-------------------------------------------------------------------------- > >>>*** An error occurred in MPI_Init > >>>*** before MPI was initialized > >>>*** MPI_ERRORS_ARE_FATAL (goodbye) > >>> > >>> > >>> > >>>The command is successful if udapl is replaced with tcp or > >>>openib. So I > >>>think my setup is correct. Also, dapltest successfully completes > >>>without any problems over IB or iWARP. > >>> > >>>Any thoughts or suggestions would be greatly appreciated. > >>> > >>>Thanks, > >>>Jon > >>> > >>>_______________________________________________ > >>>users mailing list > >>>us...@open-mpi.org > >>>http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> > >>_______________________________________________ > >>users mailing list > >>us...@open-mpi.org > >>http://www.open-mpi.org/mailman/listinfo.cgi/users > >_______________________________________________ > >users mailing list > >us...@open-mpi.org > >http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users