Re: [O-MPI users] Hpl Bench mark and Openmpi rc3
On Oct 13, 2005, at 1:25 AM, Allan Menezes wrote: I have a 16 node cluster of x86 machines with FC3 running on the head node. I used a beta version of OSCAR 4.2 for putting together the cluster. It uses /home/allan as the NFS directory. Greetings Allan. Sorry for the delay in replying -- we were all at an Open MPI working meeting last week, and the schedule got a bit hectic. Your setup sounds find. I tried Mpich2v1.02p1 and got abench mark of 26GFlops for it approx. WIth open mpi 1.0RC3 having set the LD_LIBRARY_PATH in .bashrc and the /opt/openmpi/bin path in .bash_profile in the home directory Two quick notes here: - Open MPI's mpirun supports the "--prefix" option which obviates needing to set these variables in your .bashrc (although setting them in permanently makes things easier in the long term -- you don't need to always specify --prefix). See the FAQ for more details on the --prefix option: http://www.open-mpi.org/faq/?category=running#mpirun-prefix - OSCAR makes use of environment modules; it contains setup to differentiate between the multiple different MPI implementations that OSCAR contains. You can trivially add a modulefile for Open MPI and therefore use the "switcher" command to easily switch between all the MPI implementations on your OSCAR cluster (once we hit 1.0, we anticipate having an OSCAR package). I cannnot seeem to get a performance beyond 9 GFlops approximately. The block size for mpich2 was 120 for best results. For open mpi for N = 22000 I have to use block sizes of 10 -11 to get a performance of 9GFlops other wise for larger block sizes(NB) it's worse. I used the same N=22000 for mpich2 and have a 16 port Gigabit Netgear ethernet switch with Gigabit realtek8169 ethernet cards. Can any one tell me why the performance with open mpi is so low compared to mpich2-1.02p1? There should clearly not be such a wide disparity in performance here; we don't see this kind of difference in our own internal testing. Can you send the output of "ompi_info --all"? -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
[O-MPI users] OpenMPI hang issue
Greetings, I have been testing OpenMPI 1.0rc3 on a rack of 8 2-processor (single core) Opteron systems connected via both Gigabit Ethernet and Myrinet. My testing has been mostly successful, although I have run into a recurring issue on a few MPI applications. The symptom is that the computation seems to progress nearly to completion, and then suddenly just hangs without terminating. One code that demonstrates this is the Tachyon parallel raytracer, available at: http://jedi.ks.uiuc.edu/~johns/raytracer/ I am using PGI 6.0-5 to compile OpenMPI, so that may be part of the root cause of this particular problem. I have attached the output of config.log to this message. Here is the output from ompi_info: Open MPI: 1.0rc3r7730 Open MPI SVN revision: r7730 Open RTE: 1.0rc3r7730 Open RTE SVN revision: r7730 OPAL: 1.0rc3r7730 OPAL SVN revision: r7730 Prefix: /opt/openmpi-1.0rc3-pgi-6.0 Configured architecture: x86_64-unknown-linux-gnu Configured by: root Configured on: Mon Oct 17 10:10:28 PDT 2005 Configure host: castor00 Built by: root Built on: Mon Oct 17 10:29:20 PDT 2005 Built host: castor00 C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes C compiler: pgcc C compiler absolute: /net/lisbon/opt/pgi-6.0-5/linux86-64/6.0/bin/pgcc C++ compiler: pgCC C++ compiler absolute: /net/lisbon/opt/pgi-6.0-5/linux86-64/6.0/bin/pgCC Fortran77 compiler: pgf77 Fortran77 compiler abs: /net/lisbon/opt/pgi-6.0-5/linux86-64/6.0/bin/pgf77 Fortran90 compiler: pgf90 Fortran90 compiler abs: /net/lisbon/opt/pgi-6.0-5/linux86-64/6.0/bin/pgf90 C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: 1 MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component v1.0) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0) MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.0) MCA timer: linux (MCA v1.0, API v1.0, Component v1.0) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.0) MCA coll: self (MCA v1.0, API v1.0, Component v1.0) MCA coll: sm (MCA v1.0, API v1.0, Component v1.0) MCA io: romio (MCA v1.0, API v1.0, Component v1.0) MCA mpool: gm (MCA v1.0, API v1.0, Component v1.0) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0) MCA pml: teg (MCA v1.0, API v1.0, Component v1.0) MCA pml: uniq (MCA v1.0, API v1.0, Component v1.0) MCA ptl: gm (MCA v1.0, API v1.0, Component v1.0) MCA ptl: self (MCA v1.0, API v1.0, Component v1.0) MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0) MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA btl: gm (MCA v1.0, API v1.0, Component v1.0) MCA btl: self (MCA v1.0, API v1.0, Component v1.0) MCA btl: sm (MCA v1.0, API v1.0, Component v1.0) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.0) MCA gpr: null (MCA v1.0, API v1.0, Component v1.0) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0) MCA iof: svc (MCA v1.0, API v1.0, Component v1.0) MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0) MCA ns: replica (MCA v1.0, API v1.0, Component v1.0) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.0) MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.0) MCA ras: localhost (MCA v1.0, API v1.0, Component v1.0) MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0) MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0) MCA rmgr: proxy (MCA v1.0, API v
Re: [O-MPI users] OpenMPI hang issue
Hello Chris, Please give the next release candidate a try. There was an issue w/ the GM port that was likely causing this. Thanks, Tim Parrott, Chris wrote: Greetings, I have been testing OpenMPI 1.0rc3 on a rack of 8 2-processor (single core) Opteron systems connected via both Gigabit Ethernet and Myrinet. My testing has been mostly successful, although I have run into a recurring issue on a few MPI applications. The symptom is that the computation seems to progress nearly to completion, and then suddenly just hangs without terminating. One code that demonstrates this is the Tachyon parallel raytracer, available at: http://jedi.ks.uiuc.edu/~johns/raytracer/ I am using PGI 6.0-5 to compile OpenMPI, so that may be part of the root cause of this particular problem. I have attached the output of config.log to this message. Here is the output from ompi_info: Open MPI: 1.0rc3r7730 Open MPI SVN revision: r7730 Open RTE: 1.0rc3r7730 Open RTE SVN revision: r7730 OPAL: 1.0rc3r7730 OPAL SVN revision: r7730 Prefix: /opt/openmpi-1.0rc3-pgi-6.0 Configured architecture: x86_64-unknown-linux-gnu Configured by: root Configured on: Mon Oct 17 10:10:28 PDT 2005 Configure host: castor00 Built by: root Built on: Mon Oct 17 10:29:20 PDT 2005 Built host: castor00 C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes C compiler: pgcc C compiler absolute: /net/lisbon/opt/pgi-6.0-5/linux86-64/6.0/bin/pgcc C++ compiler: pgCC C++ compiler absolute: /net/lisbon/opt/pgi-6.0-5/linux86-64/6.0/bin/pgCC Fortran77 compiler: pgf77 Fortran77 compiler abs: /net/lisbon/opt/pgi-6.0-5/linux86-64/6.0/bin/pgf77 Fortran90 compiler: pgf90 Fortran90 compiler abs: /net/lisbon/opt/pgi-6.0-5/linux86-64/6.0/bin/pgf90 C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: 1 MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component v1.0) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0) MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.0) MCA timer: linux (MCA v1.0, API v1.0, Component v1.0) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.0) MCA coll: self (MCA v1.0, API v1.0, Component v1.0) MCA coll: sm (MCA v1.0, API v1.0, Component v1.0) MCA io: romio (MCA v1.0, API v1.0, Component v1.0) MCA mpool: gm (MCA v1.0, API v1.0, Component v1.0) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0) MCA pml: teg (MCA v1.0, API v1.0, Component v1.0) MCA pml: uniq (MCA v1.0, API v1.0, Component v1.0) MCA ptl: gm (MCA v1.0, API v1.0, Component v1.0) MCA ptl: self (MCA v1.0, API v1.0, Component v1.0) MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0) MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA btl: gm (MCA v1.0, API v1.0, Component v1.0) MCA btl: self (MCA v1.0, API v1.0, Component v1.0) MCA btl: sm (MCA v1.0, API v1.0, Component v1.0) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.0) MCA gpr: null (MCA v1.0, API v1.0, Component v1.0) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0) MCA iof: svc (MCA v1.0, API v1.0, Component v1.0) MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0) MCA ns: replica (MCA v1.0, API v1.0, Component v1.0) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.0) MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.0) MCA ras: localhost (MCA v1.0, API v1.0, Component v1.0) MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0) MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0) MCA rds: resfi
Re: [O-MPI users] Hpl Bench mark and Openmpi rc3 (Jeff Squyres)
users-requ...@open-mpi.org wrote: Send users mailing list submissions to us...@open-mpi.org Today's Topics: 1. Re: Hpl Bench mark and Openmpi rc3 (Jeff Squyres) -- Message: 1 Date: Mon, 17 Oct 2005 10:16:39 -0400 From: Jeff Squyres Subject: Re: [O-MPI users] Hpl Bench mark and Openmpi rc3 To: Open MPI Users Message-ID: <8557a377fe1f131e23274e10e5f6e...@open-mpi.org> Content-Type: text/plain; charset=US-ASCII; format=flowed On Oct 13, 2005, at 1:25 AM, Allan Menezes wrote: I have a 16 node cluster of x86 machines with FC3 running on the head node. I used a beta version of OSCAR 4.2 for putting together the cluster. It uses /home/allan as the NFS directory. Greetings Allan. Sorry for the delay in replying -- we were all at an Open MPI working meeting last week, and the schedule got a bit hectic. Your setup sounds find. I tried Mpich2v1.02p1 and got abench mark of 26GFlops for it approx. WIth open mpi 1.0RC3 having set the LD_LIBRARY_PATH in .bashrc and the /opt/openmpi/bin path in .bash_profile in the home directory Two quick notes here: - Open MPI's mpirun supports the "--prefix" option which obviates needing to set these variables in your .bashrc (although setting them in permanently makes things easier in the long term -- you don't need to always specify --prefix). See the FAQ for more details on the --prefix option: http://www.open-mpi.org/faq/?category=running#mpirun-prefix - OSCAR makes use of environment modules; it contains setup to differentiate between the multiple different MPI implementations that OSCAR contains. You can trivially add a modulefile for Open MPI and therefore use the "switcher" command to easily switch between all the MPI implementations on your OSCAR cluster (once we hit 1.0, we anticipate having an OSCAR package). I cannnot seeem to get a performance beyond 9 GFlops approximately. The block size for mpich2 was 120 for best results. For open mpi for N = 22000 I have to use block sizes of 10 -11 to get a performance of 9GFlops other wise for larger block sizes(NB) it's worse. I used the same N=22000 for mpich2 and have a 16 port Gigabit Netgear ethernet switch with Gigabit realtek8169 ethernet cards. Can any one tell me why the performance with open mpi is so low compared to mpich2-1.02p1? There should clearly not be such a wide disparity in performance here; we don't see this kind of difference in our own internal testing. Can you send the output of "ompi_info --all"? Hi Jeff, I installed two versions of open mpi slightly different. One on /opt/openmpi or I would get the gfortran error and the other in /home/allan/openmpi However I do not think that is the problem as the path names are specified in the bahrc and bash_profile files of the /home/allan directory. I also log into user allan who is not a superuser.On running the open mpi with HPL I use the following command line: a1> mpirun -mca pls_rsh_orted /home/allan/openmpi/bin/orted -hostfile aa -np 16 ./xhpl from the directory where xhpl resides such as /homer/open/bench and I use the -mca command pls_rsh_orted as it otherwise comes up with an error that it cannot find the ORTED daemon on machines a1, a2 etc. That is probaly aconfiguration error. However the commands above and the setup described works fine and there are no errors in the HPL.out file, except that it is slow. I use an atlas BLAS library for creating xhpl from hpl.tar.gz. The make file for hpl uses the atlas libs and the open mpi mpicc compiler for both compilation and linking. and I have zeroed out the MPI macro paths in Make.open(that's what I reanmed the hpl makefile) for make arch=open in hpl directory. Please find attached the ompi_info -all file as requested. Thank you very much: Allan Open MPI: 1.0rc3r7730 Open MPI SVN revision: r7730 Open RTE: 1.0rc3r7730 Open RTE SVN revision: r7730 OPAL: 1.0rc3r7730 OPAL SVN revision: r7730 MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component v1.0) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0) MCA timer: linux (MCA v1.0, API v1.0, Component v1.0) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.0) MCA coll: self (MCA v1.0, API v1.0, Component v1.0) MCA coll: sm (MCA v1.0, API v1.0, Component v1.0) MCA io: romio (MCA v1.0, API v1.0, Component v1.0) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0) MCA pml: teg (MCA v1.0, API v1.0, Component v1.