As far as the nightly builds go, I'm still seeing what I believe to be this problem in both r10670 and r10652. This is happening with both Linux and OS X. Below are the systems and ompi_info for the newest revision 10670.
As an example of the error, when running HPL with Myrinet I get the following error. Using tcp everything is fine and I see the results I'd expect. ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 42820214496954887558164928727596662784.0000000 ...... FAILED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 156556068835.2711182 ...... FAILED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1156439380.5172558 ...... FAILED ||Ax-b||_oo . . . . . . . . . . . . . . . . . = 272683853978565028754868928512.000000 ||A||_oo . . . . . . . . . . . . . . . . . . . = 3822.884181 ||A||_1 . . . . . . . . . . . . . . . . . . . = 3823.922627 ||x||_oo . . . . . . . . . . . . . . . . . . . = 37037692483529688659798261760.000000 ||x||_1 . . . . . . . . . . . . . . . . . . . = 4102704048669982798475494948864.000000 =================================================== Finished 1 tests with the following results: 0 tests completed and passed residual checks, 1 tests completed and failed residual checks, 0 tests skipped because of illegal input values. ---------------------------------------------------------------------------- Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64 PPC970FX, altivec supported GNU/Linux jbronder@node41 ~ $ /usr/local/ompi-gnu-1.1.1a/bin/ompi_info Open MPI: 1.1.1a1r10670 Open MPI SVN revision: r10670 Open RTE: 1.1.1a1r10670 Open RTE SVN revision: r10670 OPAL: 1.1.1a1r10670 OPAL SVN revision: r10670 Prefix: /usr/local/ompi-gnu-1.1.1a Configured architecture: powerpc64-unknown-linux-gnu Configured by: root Configured on: Thu Jul 6 10:15:37 EDT 2006 Configure host: node41 Built by: root Built on: Thu Jul 6 10:28:14 EDT 2006 Built host: node41 C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: gfortran Fortran77 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1.1) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.1) MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.1) MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1) MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1) MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1) MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1) MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1) MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1) MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1) MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1) MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.1) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0) MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.1) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.1) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.1) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.1) MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.1) MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.1) MCA ns: replica (MCA v1.0, API v1.0, Component v1.1.1) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1.1) MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1.1) MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1.1) MCA ras: tm (MCA v1.0, API v1.0, Component v1.1.1) MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1.1) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1.1) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1.1) MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.1) MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1.1) MCA rml: oob (MCA v1.0, API v1.0, Component v1.1.1) MCA pls: fork (MCA v1.0, API v1.0, Component v1.1.1) MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1.1) MCA pls: tm (MCA v1.0, API v1.0, Component v1.1.1) MCA sds: env (MCA v1.0, API v1.0, Component v1.1.1) MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1.1) MCA sds: seed (MCA v1.0, API v1.0, Component v1.1.1) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1.1) Configured as: ./configure \ --prefix=$PREFIX \ --enable-mpi-f77 \ --enable-mpi-f90 \ --enable-mpi-profile \ --enable-mpi-cxx \ --enable-pty-support \ --enable-shared \ --enable-smp-locks \ --enable-io-romio \ --with-tm=/usr/local/pbs \ --without-xgrid \ --without-slurm \ --with-gm=/opt/gm Darwin node90.meldrew.clusters.umaine.edu 8.6.0 Darwin Kernel Version 8.6.0: Tue Mar 7 16:58:48 PST 2006; root:xnu-792.6.70.obj~1/RELEASE_PPC Power Macintosh powerpc node90:~/src/hpl jbronder$ /usr/local/ompi-xl/bin/ompi_info Open MPI: 1.1.1a1r10670 Open MPI SVN revision: r10670 Open RTE: 1.1.1a1r10670 Open RTE SVN revision: r10670 OPAL: 1.1.1a1r10670 OPAL SVN revision: r10670 Prefix: /usr/local/ompi-xl Configured architecture: powerpc-apple-darwin8.6.0 Configured by: Configured on: Thu Jul 6 10:05:20 EDT 2006 Configure host: node90.meldrew.clusters.umaine.edu Built by: root Built on: Thu Jul 6 10:37:40 EDT 2006 Built host: node90.meldrew.clusters.umaine.edu C bindings: yes C++ bindings: yes Fortran77 bindings: yes (lower case) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: /opt/ibmcmp/vac/6.0/bin/xlc C compiler absolute: /opt/ibmcmp/vac/6.0/bin/xlc C++ compiler: /opt/ibmcmp/vacpp/6.0/bin/xlc++ C++ compiler absolute: /opt/ibmcmp/vacpp/6.0/bin/xlc++ Fortran77 compiler: /opt/ibmcmp/xlf/8.1/bin/xlf_r Fortran77 compiler abs: /opt/ibmcmp/xlf/8.1/bin/xlf_r Fortran90 compiler: /opt/ibmcmp/xlf/8.1/bin/xlf90_r Fortran90 compiler abs: /opt/ibmcmp/xlf/8.1/bin/xlf90_r C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes MCA memory: darwin (MCA v1.0, API v1.0, Component v1.1.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.1) MCA timer: darwin (MCA v1.0, API v1.0, Component v1.1.1) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.1) MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1) MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1) MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1) MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1) MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1) MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1) MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1) MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1) MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.1) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0) MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.1) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.1) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.1) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.1) MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.1) MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.1) MCA ns: replica (MCA v1.0, API v1.0, Component v1.1.1) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1.1) MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1.1) MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1.1) MCA ras: tm (MCA v1.0, API v1.0, Component v1.1.1) MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1.1) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1.1) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1.1) MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.1) MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1.1) MCA rml: oob (MCA v1.0, API v1.0, Component v1.1.1) MCA pls: fork (MCA v1.0, API v1.0, Component v1.1.1) MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1.1) MCA pls: tm (MCA v1.0, API v1.0, Component v1.1.1) MCA sds: env (MCA v1.0, API v1.0, Component v1.1.1) MCA sds: seed (MCA v1.0, API v1.0, Component v1.1.1) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1.1) MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1.1) Configured as: ./configure \ --prefix=$PREFIX \ --with-tm=/usr/local/pbs/ \ --with-gm=/opt/gm \ --enable-static \ --disable-cxx On 7/3/06, George Bosilca <bosi...@cs.utk.edu> wrote:
Bernard, A bug in the Open MPI GM driver was discovered after the 1.1 release. A patch for the 1.1 is on the way. However, I don't know if it will be available before the 1.1.1. Meanwhile, you can use the nightly build version or a fresh check-out from the SVN repository. Both of them have the GM bug corrected. Sorry for the troubles, george. On Jul 3, 2006, at 12:58 PM, Borenstein, Bernard S wrote: > I've built and sucessfully run the Nasa Overflow 2.0aa program with > Openmpi 1.0.2. I'm running on an opteron linux cluster running SLES 9 > and GM 2.0.24. I built Openmpi 1.1 with the intel 9 compilers and > try to > run Overflow 2.0aa with myrinet, it get what looks like a data > corruption error and the program dies quickly. > There are no mpi errors at all.If I run using GIGE (--mca btl > self,tcp), > the program runs to competion correctly. Here is my ompi_info > output : > > bsb3227@mahler:~/openmpi_1.1/bin> ./ompi_info > Open MPI: 1.1 > Open MPI SVN revision: r10477 > Open RTE: 1.1 > Open RTE SVN revision: r10477 > OPAL: 1.1 > OPAL SVN revision: r10477 > Prefix: /home/bsb3227/openmpi_1.1 > Configured architecture: x86_64-unknown-linux-gnu > Configured by: bsb3227 > Configured on: Fri Jun 30 07:08:54 PDT 2006 > Configure host: mahler > Built by: bsb3227 > Built on: Fri Jun 30 07:54:46 PDT 2006 > Built host: mahler > C bindings: yes > C++ bindings: yes > Fortran77 bindings: yes (all) > Fortran90 bindings: yes > Fortran90 bindings size: small > C compiler: icc > C compiler absolute: /opt/intel/cce/9.0.25/bin/icc > C++ compiler: icpc > C++ compiler absolute: /opt/intel/cce/9.0.25/bin/icpc > Fortran77 compiler: ifort > Fortran77 compiler abs: /opt/intel/fce/9.0.25/bin/ifort > Fortran90 compiler: /opt/intel/fce/9.0.25/bin/ifort > Fortran90 compiler abs: /opt/intel/fce/9.0.25/bin/ifort > C profiling: yes > C++ profiling: yes > Fortran77 profiling: yes > Fortran90 profiling: yes > C++ exceptions: no > Thread support: posix (mpi: no, progress: no) > Internal debug support: no > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > libltdl support: yes > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component > v1.1) > MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1) > MCA maffinity: first_use (MCA v1.0, API v1.0, Component > v1.1) > MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1) > MCA timer: linux (MCA v1.0, API v1.0, Component v1.1) > MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) > MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) > MCA coll: basic (MCA v1.0, API v1.0, Component v1.1) > MCA coll: hierarch (MCA v1.0, API v1.0, Component > v1.1) > MCA coll: self (MCA v1.0, API v1.0, Component v1.1) > MCA coll: sm (MCA v1.0, API v1.0, Component v1.1) > MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1) > MCA io: romio (MCA v1.0, API v1.0, Component v1.1) > MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1) > MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1) > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1) > MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1) > MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1) > MCA btl: self (MCA v1.0, API v1.0, Component v1.1) > MCA btl: sm (MCA v1.0, API v1.0, Component v1.1) > MCA btl: gm (MCA v1.0, API v1.0, Component v1.1) > MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) > MCA topo: unity (MCA v1.0, API v1.0, Component v1.1) > MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0) > MCA gpr: null (MCA v1.0, API v1.0, Component v1.1) > MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1) > MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1) > MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1) > MCA iof: svc (MCA v1.0, API v1.0, Component v1.1) > MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1) > MCA ns: replica (MCA v1.0, API v1.0, Component v1.1) > MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) > MCA ras: dash_host (MCA v1.0, API v1.0, Component > v1.1) > MCA ras: hostfile (MCA v1.0, API v1.0, Component > v1.1) > MCA ras: localhost (MCA v1.0, API v1.0, Component > v1.1) > MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1) > MCA ras: tm (MCA v1.0, API v1.0, Component v1.1) > MCA rds: hostfile (MCA v1.0, API v1.0, Component > v1.1) > MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1) > MCA rmaps: round_robin (MCA v1.0, API v1.0, Component > v1.1) > MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1) > MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1) > MCA rml: oob (MCA v1.0, API v1.0, Component v1.1) > MCA pls: fork (MCA v1.0, API v1.0, Component v1.1) > MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1) > MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1) > MCA pls: tm (MCA v1.0, API v1.0, Component v1.1) > MCA sds: env (MCA v1.0, API v1.0, Component v1.1) > MCA sds: seed (MCA v1.0, API v1.0, Component v1.1) > MCA sds: singleton (MCA v1.0, API v1.0, Component > v1.1) > MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1) > MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1) > > Here is the ifconfig for one of the nodes : > > bsb3227@m045:~> /sbin/ifconfig > eth0 Link encap:Ethernet HWaddr 00:50:45:5D:CD:FE > inet addr:10.241.194.45 Bcast:10.241.195.255 > Mask:255.255.254.0 > inet6 addr: fe80::250:45ff:fe5d:cdfe/64 Scope:Link > UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 > Metric:1 > RX packets:39913407 errors:0 dropped:0 overruns:0 frame:0 > TX packets:48794587 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:31847343907 (30371.9 Mb) TX bytes:48231713866 > (45997.3 Mb) > Interrupt:19 > > eth1 Link encap:Ethernet HWaddr 00:50:45:5D:CD:FF > inet6 addr: fe80::250:45ff:fe5d:cdff/64 Scope:Link > UP BROADCAST MULTICAST MTU:1500 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) > Interrupt:19 > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:23141 errors:0 dropped:0 overruns:0 frame:0 > TX packets:23141 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:20145689 (19.2 Mb) TX bytes:20145689 (19.2 Mb) > > I hope someone can give me some guidance on how to debug this problem. > Thanx in advance for any help > that can be provided. > > Bernie Borenstein > The Boeing Company > <config.log.gz> > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users "Half of what I say is meaningless; but I say it so that the other half may reach you" Kahlil Gibran _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users