Re: [OMPI users] OpenMPI Giving problems when using -mca btl mx, sm, self
Hi Terry, Thanks for replying. The following command is working fine: /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl tcp,sm,self -machinefile machines ./hello The contents of machines are: indus1 indus2 indus3 indus4 I have tried using np=2 over pairs of machines, but the problem is same. The errors that occur are given below with the command that I am trying. **Test 1** /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host "indus1,indus2" ./hello -- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) **Test 2* */opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host "indus1,indus3" ./hello -- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) * *Test 3* */opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host "indus1,indus4" ./hello -- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the
Re: [OMPI users] Open MPI on 64 bits intel Mac OS X
Brian, thank you very much for your suggestion. I have succesfully recompiled Open MPI for 64 bits and it works like a charm. Anyway, it would be nice to have this option available as a configure switch. Cheers, Massimo On Sep 28, 2007, at 3:28 PM, Brian Barrett wrote: On Sep 28, 2007, at 4:56 AM, Massimo Cafaro wrote: Dear all, when I try to compile my MPI code on 64 bits intel Mac OS X the build fails since the Open MPI library has been compiled using 32 bits. Can you please provide in the next version the ability at configure time to choose between 32 and 64 bits or even better compile by defaults using both modes? To reproduce the problem, simply compile on 64 bits intel Mac OS X an MPI application using mpicc -arch x86_64. The 64 bits linker complains as follows: ld64 warning: in /usr/local/mpi/lib/libmpi.dylib, file is not of required architecture ld64 warning: in /usr/local/mpi/lib/libopen-rte.dylib, file is not of required architecture ld64 warning: in /usr/local/mpi/lib/libopen-pal.dylib, file is not of required architecture and a number of undefined symbols is shown, one for each MPI function used in the application. This is already possible. Simply use the configure options: ./configure ... CFLAGS="-arch x86_64" CXXFLAGS="-arch x86_64" OBJCFLAGS="-arch x86_64" also set FFLAGS and FCFLAGS to "-m64" if you have gfortran/g95 compiler installed. The common installs of either don't speak the - arch option, so you have to use the more traditional -m64. Hope this helps, Brian ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- *** Massimo Cafaro, Ph.D. National Nanotechnology Laboratory (NNL/CNR-INFM) Assistant ProfessorEuro- Mediterranean Centre for Climate Change Dept. of Engineering for InnovationSPACI Consortium University of Salento, Lecce, Italy Via per Monteroni Voice +39 0832 297371 Fax +39 0832 298173 73100 Lecce, Italy Web http://sara.unile.it/~cafaro E-mail massimo.caf...@unile.it caf...@cacr.caltech.edu *** PGP.sig Description: This is a digitally signed message part
Re: [OMPI users] OpenMPI Giving problems when using -mca btl mx, sm, self
I would reccommend trying a few things: 1. Set some debugging flags and see if that helps. So, I would try something like: /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,self -host "indus1,indus2" -mca btl_base_debug 1000 ./hello This will output information as each btl is loaded, and whether or not the load succeeds. 2. Try running with the mx mtl instead of the btl: /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" ./hello Similarly, for debug output: /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" -mca mtl_base_debug 1000 ./hello Let me know if any of these work. Thanks, Tim On Saturday 29 September 2007 01:53:06 am Hammad Siddiqi wrote: > Hi Terry, > > Thanks for replying. The following command is working fine: > > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl tcp,sm,self -machinefile > machines ./hello > > The contents of machines are: > indus1 > indus2 > indus3 > indus4 > > I have tried using np=2 over pairs of machines, but the problem is same. > The errors that occur are given below with the command that I am trying. > > **Test 1** > > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host > "indus1,indus2" ./hello > -- > Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -- > -- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > PML add procs failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -- > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (goodbye) > -- > Process 0.1.0 is unable to reach 0.1.1 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -- > -- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > PML add procs failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -- > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (goodbye) > > **Test 2* > > */opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host > "indus1,indus3" ./hello > -- > Process 0.1.0 is unable to reach 0.1.1 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -- > -- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > PML add procs failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -- > *** An error occurred in MPI_Init > *** before MPI was initialized > *** MPI_ERRORS_ARE_FATAL (goodbye) > -- > Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > If you specified the use of a BTL component, you may have > forgotten a component (such as "self") in the list of > usable components. > -- > -- >
Re: [OMPI users] OpenMPI Giving problems when using -mca btl mx, sm, self
To use Tim Prins 2nd suggestion, you would also need to add "-mca pml cm" to the runs with "-mca mtl mx". On 9/29/07, Tim Prins wrote: > I would reccommend trying a few things: > > 1. Set some debugging flags and see if that helps. So, I would try something > like: > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl > mx,self -host "indus1,indus2" -mca btl_base_debug 1000 ./hello > > This will output information as each btl is loaded, and whether or not the > load succeeds. > > 2. Try running with the mx mtl instead of the btl: > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" ./hello > > Similarly, for debug output: > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" -mca > mtl_base_debug 1000 ./hello > > Let me know if any of these work. > > Thanks, > > Tim > > On Saturday 29 September 2007 01:53:06 am Hammad Siddiqi wrote: > > Hi Terry, > > > > Thanks for replying. The following command is working fine: > > > > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl tcp,sm,self -machinefile > > machines ./hello > > > > The contents of machines are: > > indus1 > > indus2 > > indus3 > > indus4 > > > > I have tried using np=2 over pairs of machines, but the problem is same. > > The errors that occur are given below with the command that I am trying. > > > > **Test 1** > > > > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host > > "indus1,indus2" ./hello > > -- > > Process 0.1.1 is unable to reach 0.1.0 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -- > > -- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > -- > > Process 0.1.0 is unable to reach 0.1.1 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -- > > -- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > > > **Test 2* > > > > */opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host > > "indus1,indus3" ./hello > > -- > > Process 0.1.0 is unable to reach 0.1.1 for MPI communication. > > If you specified the use of a BTL component, you may have > > forgotten a component (such as "self") in the list of > > usable components. > > -- > > -- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -- > > *** An error occurred in MPI_Init > > *** before MPI was initialized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > -- > > Process
[OMPI users] Make error - MacOSX, Intel v10 compilers and Xgrid MCA
(I sent this already, but it didn't appear on the list. The tar- gzipped output from configure and make was over 100kB, so I am sending again without that attached). It seems that the XGrid MCA with OpenMPI 1.2.4 does not compile on a Mac/Intel system using the latest Intel compilers (seems to be OK with gcc). I downloaded the latest (Intel v10 20070809) C/C++ and Fortran demos and get the following error when building OpenMPI (output from configure and make are available but possibly too large for the mailing list): ./configure CC=icc CXX=icpc F77=ifort F90=ifort [...ok...] make all [...] /bin/sh ../../../../libtool --mode=link gcc -g -O2 -module -avoid- version -framework XGridFoundation -framework Foundation -export- dynamic -Wl,-u,_munmap -Wl,-multiply_defined,suppress -o mca_pls_xgrid.la -rpath /usr/local/lib/openmpi src/ pls_xgrid_component.lo src/pls_xgrid_module.lo src/ pls_xgrid_client.lo /Users/conway/programs/openMPI/openmpi-1.2.4/orte/ libopen-rte.la /Users/conway/programs/openMPI/openmpi-1.2.4/opal/ libopen-pal.la libtool: link: unable to infer tagged configuration libtool: link: specify a tag with `--tag' make[2]: *** [mca_pls_xgrid.la] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 What I notice here is that despite my specification of the Intel compilers on the configure command line (including the correct c++ icpc compiler!) the libtool command that fails seems to be using gcc (... --mode=link gcc ...) on the Xgrid sources. This is part of the Modular Component Architecture (MCA) setup (configure.out) and also uses gcc for the compiles: libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../orte/include -I../../../../ompi/include -I/Users/conway/ programs/openMPI/openmpi-1.2.4/include -I../../../.. -D_REENTRANT -g - O2 -MT src/pls_xgrid_module.lo -MD -MP -MF src/.deps/ pls_xgrid_module.Tpo -c src/pls_xgrid_module.m -fno-common -DPIC -o src/.libs/pls_xgrid_module.o I wouldn't expect this, but I can't say if it is intended or not. This particular error can be avoided by excluding xgrid: ./configure CC=icc CXX=icpc F77=ifort F90=ifort --without-xgrid James Conway PS. Please note that the instructions for collecting install and make information are not quite right, maybe out-of-date. On this page: http://www.open-mpi.org/community/help/ the following instruction is given: shell% cp config.log share/include/ompi_config.h $HOME/ompi-output There is no "share" directory in the openMPI area, and the file seems instead to be in "ompi": ompi/include/ompi_config.h -- James Conway, PhD., Department of Structural Biology University of Pittsburgh School of Medicine Biomedical Science Tower 3, Room 2047 3501 5th Ave Pittsburgh, PA 15260 U.S.A. Phone: +1-412-383-9847 Fax: +1-412-648-8998 Email: jxc...@pitt.edu --
Re: [OMPI users] Make error - MacOSX, Intel v10 compilers and Xgrid MCA
On Sep 29, 2007, at 5:15 PM, James Conway wrote: What I notice here is that despite my specification of the Intel compilers on the configure command line (including the correct c++ icpc compiler!) the libtool command that fails seems to be using gcc (... --mode=link gcc ...) on the Xgrid sources. This is part of the Modular Component Architecture (MCA) setup (configure.out) and also uses gcc for the compiles: libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../orte/include -I../../../../ompi/include -I/Users/conway/ programs/openMPI/openmpi-1.2.4/include -I../../../.. -D_REENTRANT -g - O2 -MT src/pls_xgrid_module.lo -MD -MP -MF src/.deps/ pls_xgrid_module.Tpo -c src/pls_xgrid_module.m -fno-common -DPIC -o src/.libs/pls_xgrid_module.o I wouldn't expect this, but I can't say if it is intended or not. This particular error can be avoided by excluding xgrid: ./configure CC=icc CXX=icpc F77=ifort F90=ifort --without-xgrid The XGrid PLS component is actually written in Objective C, as it needs to use the XGrid Framework, which is in Objective C. While gcc on OS X is both a C and Objective C compiler, icc is only a C compiler. So gcc is being invoked as the Objective C compiler in this case. Unfortunately, libtool doesn't properly speak Objective C, so when the C compiler and Objective C compiler are different, it can get confused. We had a workaround for previous 1.2 releases, but with 1.2.4, we broke our workaround. A new, more stable workaround has been committed and should be part of the 1.2.5 release. In the meantime, disabling XGrid will obviously work around the issue. Brian