Re: [OMPI users] Run failure on Solaris Opteron with Sun Studio 11
On Mar 8, 2006, at 4:46 AM, Pierre Valiron wrote: Sorry for the interruption. I back on mpi tracks again. I have rebuilt openmpi-1.0.2a9 with -g and the error is unchanged. I have also discovered that I don't need to run any openmpi application to show up the error. mpirun --help or mpirun show up the same error: valiron@icare ~ > mpirun *Segmentation fault (core dumped) and valiron@icare ~ > pstack core core 'core' of 13842: mpirun fd7ffee9dfe0 strlen () + 20 fd7ffeef6ab3 vsprintf () + 33 fd7fff180fd1 opal_vasprintf () + 41 fd7fff180f88 opal_asprintf () + 98 004098a3 orterun () + 63 00407214 main () + 34 0040708c () Seems very basic ! It turns out this was an error in our compatibility code for asprintf (). We were doing something with va_list structures that Solaris didn't like. I'm actually surprised that it worked on the UltraSparc version of Solaris, but it has been for some time for us. Anyway, I committed a fix at r9223 on the subversion trunk - it should make tonight's nightly tarball for the trunk. I've also asked the release managers for v1.0.2 to push the fix into that release. Thanks for reporting the issue and for the account. Let me know if you have any further problems. Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/
Re: [OMPI users] Run failure on Solaris Opteron with Sun Studio 11
Brian, Thanks for the quick night fix. I could not find r9223 on the subversion trunk but I downloaded r9224 instead. - Configure and compile are okay - However compiling the mpi.f90 takes over 35 *minutes* with -O1. This seems a bit excessive... I tried removing any -O option and things are just as slow. Is this behaviour related to open-mpi or to some wrong features of the Studio11 compiler ? - 'mpirun --help' non longer crashes. - standard output seems messy: a) 'mpirun -np 4 pwd' returns randomly 1 or two lines, never 4. The same behaviour occurs if the output is redirected to a file. b) When running some simple "demo" fortran code, the standard output is buffered within open-mpi and all results are issued at the end. No intermediates are showed. - running a slightly more elaborate program fails: a) compile behaves differently with mpif77 and mpif90. While mpif90 compiles and builds "silently", mpif77 is talkative: valiron@icare ~/BENCHES > mpif77 -xtarget=opteron -xarch=amd64 -o all all.f NOTICE: Invoking /opt/Studio11/SUNWspro/bin/f90 -f77 -ftrap=%none -I/users/valiron/lib/openmpi-1.1a1r9224/include -xtarget=opteron -xarch=amd64 -o all all.f -L/users/valiron/lib/openmpi-1.1a1r9224/lib -lmpi -lorte -lopal -lsocket -lnsl -lrt -lm -lthread -ldl all.f: rw_sched: MAIN all: lam_alltoall: my_alltoall1: my_alltoall2: my_alltoall3: my_alltoall4: check_buf: alltoall_sched_ori: alltoall_sched_new: b) whatever the code was compiled with mpif77 or mpif90, execution fails: valiron@icare ~/BENCHES > mpirun -np 2 all Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR) Failing at addr:40 *** End of error message *** Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR) Failing at addr:40 *** End of error message *** Compiling with -g adds no more information. I attach the all.f program... (this program was used last summer to discuss several strategies for alltoall over ethernet on the lammpi list). Pierre. Brian Barrett wrote: On Mar 8, 2006, at 4:46 AM, Pierre Valiron wrote: Sorry for the interruption. I back on mpi tracks again. I have rebuilt openmpi-1.0.2a9 with -g and the error is unchanged. I have also discovered that I don't need to run any openmpi application to show up the error. mpirun --help or mpirun show up the same error: valiron@icare ~ > mpirun *Segmentation fault (core dumped) and valiron@icare ~ > pstack core core 'core' of 13842: mpirun fd7ffee9dfe0 strlen () + 20 fd7ffeef6ab3 vsprintf () + 33 fd7fff180fd1 opal_vasprintf () + 41 fd7fff180f88 opal_asprintf () + 98 004098a3 orterun () + 63 00407214 main () + 34 0040708c () Seems very basic ! It turns out this was an error in our compatibility code for asprintf (). We were doing something with va_list structures that Solaris didn't like. I'm actually surprised that it worked on the UltraSparc version of Solaris, but it has been for some time for us. Anyway, I committed a fix at r9223 on the subversion trunk - it should make tonight's nightly tarball for the trunk. I've also asked the release managers for v1.0.2 to push the fix into that release. Thanks for reporting the issue and for the account. Let me know if you have any further problems. Brian -- Soutenez le mouvement SAUVONS LA RECHERCHE : http://recherche-en-danger.apinc.org/ _/_/_/_/_/ _/ Dr. Pierre VALIRON _/ _/ _/ _/ Laboratoire d'Astrophysique _/ _/ _/ _/Observatoire de Grenoble / UJF _/_/_/_/_/_/BP 53 F-38041 Grenoble Cedex 9 (France) _/ _/ _/http://www-laog.obs.ujf-grenoble.fr/~valiron/ _/ _/ _/ Mail: pierre.vali...@obs.ujf-grenoble.fr _/ _/ _/ Phone: +33 4 7651 4787 Fax: +33 4 7644 8821 _/ _/_/ all.f.gz Description: GNU Zip compressed data
[OMPI users] Myrinet on linux cluster
Hi, I am trying to install OPENMPI on a Linux cluster with 22 dual Opteron nodes and a Myrinet interconnect. I am having trouble with the build with the GM libraries. I configured with: ./configure --prefix-/users/rosmond/ompi --with-gm=/usr/lib64 --enable-mpi2-one-sided and the environmental variables: setenv FC pgf90 setenv F77 pgf90 setenv CCPFLAGS /usr/include/gm ! (note this non-standard location) The configure seemed to go OK, but the make failed. As you see at the end of the make output, it doesn't like the format of libgm.so. It looks to me that it is using a path (/usr/lib/.) to 32 bit libraries, rather than 64 bit (/usr/lib64/). Is this correct? What's the solution? Tom Rosmond config.log.bz2 Description: BZip2 compressed data config_out.bz2 Description: BZip2 compressed data make_out.bz2 Description: BZip2 compressed data
Re: [OMPI users] [Fwd: MPI_SEND blocks when crossing node boundary]
Please note that I replied to your original post: http://www.open-mpi.org/community/lists/users/2006/02/0712.php Was that not sufficient? If not, please provide more details on what you are attempting to do and what is occurring. Thanks. On Mar 7, 2006, at 2:36 PM, Cezary Sliwa wrote: Hello again, The problem is that MPI_SEND blocks forever (the message is still not delivered after many hours). Cezary Sliwa From: Cezary Sliwa Date: February 22, 2006 10:07:04 AM EST To: us...@open-mpi.org Subject: MPI_SEND blocks when crossing node boundary My program runs fine with openmpi-1.0.1 when run from the command line (5 processes with empty host file), but when I schedule it with qsub to run on 2 nodes it blocks on MPI_SEND (gdb) info stack #0 0x0034db30c441 in __libc_sigaction () from /lib64/tls/ libpthread.so.0 #1 0x00573002 in opal_evsignal_recalc () #2 0x00582a3c in poll_dispatch () #3 0x005729f2 in opal_event_loop () #4 0x00577e68 in opal_progress () #5 0x004eed4a in mca_pml_ob1_send () #6 0x0049abdd in PMPI_Send () #7 0x00499dc0 in pmpi_send__ () #8 0x0042d5d8 in MAIN__ () at main.f:90 #9 0x005877de in main (argc=Variable "argc" is not available. ) ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [OMPI users] MPI for DSP
On Mar 6, 2006, at 10:19 PM, 赖俊杰 wrote: hello everyone,I'm a research assistant at Tsinghua University. And now,i begin to study the MPI for DSP. Can anybody tell me something on this field? If you're looking for an embedded MPI implementation, Open MPI is not for you. You might want to google around for one -- I know that there was a commercial one for at least some period of time (have no idea if it still exists or not). -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
[OMPI users] Open MPI and MultiRail InfiniBand
I've got a machine that has the following config: Each node has two InfiniBand ports: * The first port is on fabric 'a' with switches for 'a' * The second port is on fabric 'b' with separate switches for 'b' * The two fabrics are not shared ('a' and 'b' can't communicate with one another) I believe that Open MPI is perfectly capable of stripeing over both fabric 'a' and 'b', and IIRC, this is the default behavior. Does Open MPI handle the case where Open MPI puts all of its traffic on the first IB port (ie. fabric 'a'), and leaves the second IB port (ie. fabric 'b') free for other uses (I'll use NFS as a humorous example). If so, is there any magic required to configure it thusly? Troy Telford
Re: [OMPI users] Myrinet on linux cluster
The configure seemed to go OK, but the make failed. As you see at the end of the make output, it doesn't like the format of libgm.so. It looks to me that it is using a path (/usr/lib/.) to 32 bit libraries, rather than 64 bit (/usr/lib64/). Is this correct? What's the solution? First thing's first: Does it compile okay with gcc? I say this because PGI's compiler has proven stubborn from time to time: I can compile Open MPI with PGI 6.0 just fine, but PGI 6.1 won't compile for me either (different reasons, though -- I posted my problem earlier this week). That being said: The distros get mixed in my mind, so I'm not sure if yours is one that: a.) Puts 32-bit libs in /lib32 and /usr/lib32, with 64-bit libs in /lib64 and /usr/lib64 (and /lib points to lib64) b.) 32-bit libs are in /lib and /usr/lib, and 64-bit are in /lib64 and /usr/lib64 If your machine is a 'b' then yes, it does appear to be trying (and failing) to use a 32-bit libgm.so The first thing I'd do is make sure you have a 64-bit version of libgm.so; at least that is what I suspect. Locate all instances of libgm.so, run 'file libgm.so' to ensure one of 'em is 64-bit, and that the 64-bit library is in a path where the linker can find it (ld.so.conf or LD_LIBRARY_PATH). -- Troy Telford
Re: [OMPI users] Myrinet on linux cluster
Troy Telford wrote: The configure seemed to go OK, but the make failed. As you see at the end of the make output, it doesn't like the format of libgm.so. It looks to me that it is using a path (/usr/lib/.) to 32 bit libraries, rather than 64 bit (/usr/lib64/). Is this correct? What's the solution? First thing's first: Does it compile okay with gcc? I'm not sure I understand, and besides I am strictly a Fortran guy. However, I have made a successful build on this system without 'gm' support, but that is not very interesting because its executables only run on the interactive node. Therefore I don't think its a Fortran compiler problem, especially since there is already an MPICH/PGI combination running on the system. I say this because PGI's compiler has proven stubborn from time to time: I can compile Open MPI with PGI 6.0 just fine, but PGI 6.1 won't compile for me either (different reasons, though -- I posted my problem earlier this week). That being said: The distros get mixed in my mind, so I'm not sure if yours is one that: a.) Puts 32-bit libs in /lib32 and /usr/lib32, with 64-bit libs in /lib64 and /usr/lib64 (and /lib points to lib64) b.) 32-bit libs are in /lib and /usr/lib, and 64-bit are in /lib64 and /usr/lib64 If your machine is a 'b' then yes, it does appear to be trying (and failing) to use a 32-bit libgm.so The answer is 'b' The first thing I'd do is make sure you have a 64-bit version of libgm.so; at least that is what I suspect. Locate all instances of libgm.so, run 'file libgm.so' to ensure one of 'em is 64-bit, and that the 64-bit library is in a path where the linker can find it (ld.so.conf or LD_LIBRARY_PATH). I checked, and '/usr/lib64/libgm.so' is definitely a 64 bit library, and I am sure that /usr/lib64 is by default in a path where the linker looks, since it is a native 64 bit (Opteron) system. Just to be sure, however, I added /usr/lib64 to LD_LIBRARY_PATH, with the same results.
Re: [OMPI users] OpenMPI 1.0.x and PGI pgf90
On Mar 3, 2006, at 10:50 AM, Troy Telford wrote: On Thu, 02 Mar 2006 03:55:46 -0700, Jeff Squyres mpi.org> wrote: That being said, I have been unable to get OpenMPI to compile with PGI 6.1 (but it does finish ./configure; it breaks during 'make'). Can you provide some details on what is going wrong? We currently only have PGI 5.2 and 6.0 to test with. No. I refuse :p Attatched is a tar.bz2 with the config.log and the output of 'make'. I wouldn't doubt it if it's just a problem with the way I have PGI 6.1 set up; I just haven't had time to investigate it yet. I think I have this fixed on the trunk. It looks like PGI tried to make the 6.1 compilers support GCC inline assembly, but it doesn't look like it's 100% correct, so for now we disabled our inline assembly support with PGI 6.1, so it will use the non-inlined version, just like the other versions of the PGI compilers. Any tarball on the trunk after r9240 should have the fix. I've asked that this gets pushed into the 1.0 branch to become part of Open MPI 1.0.2. Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/
Re: [OMPI users] OpenMPI 1.0.x and PGI pgf90
On Thu, Mar 09, 2006 at 09:13:46PM -0500, Brian Barrett wrote: > I think I have this fixed on the trunk. It looks like PGI tried to > make the 6.1 compilers support GCC inline assembly, but it doesn't > look like it's 100% correct, ... and that's no surprise. The spec in the gcc info pages doesn't reflect reality, and with our compiler, I filed 20 bugs before we got gmp (gnu multi-precision library, a heavy user of inline assembly) to work. Doctor, it hurts when I do this... -- greg
Re: [OMPI users] OpenMPI 1.0.x and PGI pgf90
On Mar 9, 2006, at 9:28 PM, Greg Lindahl wrote: On Thu, Mar 09, 2006 at 09:13:46PM -0500, Brian Barrett wrote: I think I have this fixed on the trunk. It looks like PGI tried to make the 6.1 compilers support GCC inline assembly, but it doesn't look like it's 100% correct, ... and that's no surprise. The spec in the gcc info pages doesn't reflect reality, and with our compiler, I filed 20 bugs before we got gmp (gnu multi-precision library, a heavy user of inline assembly) to work. Doctor, it hurts when I do this... Yes, the inline assembly is the second least favorite part of the Open MPI code base for me. And we don't even do that much complicated with our inline assembly (memory barriers on platforms that need them, spinlocks, and atomic add). The part I found interesting is it's the only compiler I've run into to date where the C compiler handled the super-simple test properly and the C++ compiler did not. Oh well, it works well enough for our purposes, so on to more broken things. The least favorite, of course, is the games we have to play to deal with free() and pinned memory caching. But that's a different story altogether... Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/
Re: [OMPI users] Myrinet on linux cluster
On Mar 9, 2006, at 2:51 PM, Tom Rosmond wrote: I am trying to install OPENMPI on a Linux cluster with 22 dual Opteron nodes and a Myrinet interconnect. I am having trouble with the build with the GM libraries. I configured with: ./configure --prefix-/users/rosmond/ompi --with-gm=/usr/lib64 -- enable-mpi2-one-sided Can you try configuring with --with-gm (no argument) and send the output from configure and make again? The --with-gm flag takes as an argument the installation prefix, not the library prefix. So in this case, it would be --with-gm=/usr, which is kind of pointless, as that's a default search location anyway. Open MPI's configure script should automatically look in /usr/lib64. In fact, it looks like configure looked there and found the right libgm, but something went amuck later in the process. Also, you really don't want to configure with the --enable-mpi2-one- sided flag. It will not do anything useful and will likely cause very bad things to happen. Open MPI 1.0.x does not have any MPI-2 onesided support. Open MPI 1.1 should have a complete implementation of the onesided chapter. and the environmental variables: setenv FC pgf90 setenv F77 pgf90 setenv CCPFLAGS /usr/include/gm ! (note this non-standard location) I assume you mean CPPFLAGS=-I/usr/include/gm, which shouldn't cause any problems. The configure seemed to go OK, but the make failed. As you see at the end of the make output, it doesn't like the format of libgm.so. It looks to me that it is using a path (/usr/lib/.) to 32 bit libraries, rather than 64 bit (/ usr/lib64/). Is this correct? What's the solution? I'm not sure at this point, but I need a build without the incorrect flag to be able to determine what went wrong. We've built Open MPI with 64 bit builds of GM before, so I'm surprised there were any problems... Thanks, Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/
Re: [OMPI users] Open MPI and MultiRail InfiniBand
On Mar 9, 2006, at 6:41 PM, Troy Telford wrote: I've got a machine that has the following config: Each node has two InfiniBand ports: * The first port is on fabric 'a' with switches for 'a' * The second port is on fabric 'b' with separate switches for 'b' * The two fabrics are not shared ('a' and 'b' can't communicate with one another) I believe that Open MPI is perfectly capable of stripeing over both fabric 'a' and 'b', and IIRC, this is the default behavior. Does Open MPI handle the case where Open MPI puts all of its traffic on the first IB port (ie. fabric 'a'), and leaves the second IB port (ie. fabric 'b') free for other uses (I'll use NFS as a humorous example). If so, is there any magic required to configure it thusly? With mvapi, we don't have the functionality in place for the user to specify which HCA port is used. The user can say that at most N HCA ports should be used through the btl_mvapi_max_btls MCA parameter. So in your case, if you ran Open MPI with: mpirun -mca btl_mvapi_max_btls 1 -np X ./foobar Only the first active port would be used for mvapi communication. I'm not sure if this is enough for your needs or not. Hope this helps, Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/
Re: [OMPI users] Myrinet on linux cluster
Attached are output files from a build with the adjustments you suggested. setenv FC pgf90 setenv F77 pgf90 setenv CCPFLAGS -I/usr/include/gm ./configure --prefix=/users/rosmond/ompi --with-gm The results are the same. P.S. I understand that the mpi2 option is just a dummy. I use it because I am porting a code from an SGI Origin, which has full mpi2 one-sided support. This options makes it unnecessary to add my own dummy MPI2 routines to my source. My code has both MPI1 and MPI2 message passing options, so it's one of the reasons I like OPENMPI over MPICH. Brian Barrett wrote: On Mar 9, 2006, at 2:51 PM, Tom Rosmond wrote: I am trying to install OPENMPI on a Linux cluster with 22 dual Opteron nodes and a Myrinet interconnect. I am having trouble with the build with the GM libraries. I configured with: ./configure --prefix-/users/rosmond/ompi --with-gm=/usr/lib64 -- enable-mpi2-one-sided Can you try configuring with --with-gm (no argument) and send the output from configure and make again? The --with-gm flag takes as an argument the installation prefix, not the library prefix. So in this case, it would be --with-gm=/usr, which is kind of pointless, as that's a default search location anyway. Open MPI's configure script should automatically look in /usr/lib64. In fact, it looks like configure looked there and found the right libgm, but something went amuck later in the process. Also, you really don't want to configure with the --enable-mpi2-one- sided flag. It will not do anything useful and will likely cause very bad things to happen. Open MPI 1.0.x does not have any MPI-2 onesided support. Open MPI 1.1 should have a complete implementation of the onesided chapter. and the environmental variables: setenv FC pgf90 setenv F77 pgf90 setenv CCPFLAGS /usr/include/gm ! (note this non-standard location) I assume you mean CPPFLAGS=-I/usr/include/gm, which shouldn't cause any problems. The configure seemed to go OK, but the make failed. As you see at the end of the make output, it doesn't like the format of libgm.so. It looks to me that it is using a path (/usr/lib/.) to 32 bit libraries, rather than 64 bit (/ usr/lib64/). Is this correct? What's the solution? I'm not sure at this point, but I need a build without the incorrect flag to be able to determine what went wrong. We've built Open MPI with 64 bit builds of GM before, so I'm surprised there were any problems... Thanks, Brian config.log.bz2 Description: BZip2 compressed data config_out.bz2 Description: BZip2 compressed data makeall_out.bz2 Description: BZip2 compressed data