[OMPI users] MPI comparison on openib
Hello We have been busy this week comparing five different MPI-implementations on a small test cluster. Several notable differences have been observed but I will limit myself to one perticular test in this e-mail (64-rank Intel MPI Benchmark alltoall on 8 dual quad nodes). Lets start with the hardware and software conditions: Hardware: 16 nodes (8 used for this test) each with two Clovertown cpus (X5355/2.66GHz, quad-core) and 16G RAM. Interconnected with IB 4x SDR on PCI-express (MT25208). Software: Centos-4.3 x86_64 2.6.9-34.0.2smp with OFED-1.1 and intel compilers 9.1.04x MPIs tested: OpenMPI-1.1.3b4, OpenMPI-1.2b3, MVAPICH-0.9.8, MVAPICH2-0.9.8 and ScaMPI-3.10.4 (ScaMPI is a commercial mpi from Scali). Main question to the OpenMPI developers: why does OpenMPI behave so badly between approx. 10 and 1000 bytes? Plot: http://www.nsc.liu.se/~cap/all2all_64pe_clover.png Notes: * The OpenMPI run tagged 'basic' was done with "-mca coll self,sm,basic" all other runs were done with whatever setting is the default. * Both x- and y-axis is log scaled. The y-axis labels are a bit hard to read but the first "5." is 50us, the 2nd 500us and so on. ompi_info: http://www.nsc.liu.se/~cap/openmpi-1.1.3b4-intel91.info http://www.nsc.liu.se/~cap/openmpi-1.2b3-intel91.info Best Regards, Peter K -- Peter Kjellström National Supercomputer Centre, Linköping Sweden pgpFFP0IesVyC.pgp Description: PGP signature
[OMPI users] Trouble Building Open MPI on SGI
I'm trying to install Open MPI 1.1.2 on an Octane 2 SGI machine running IRIX 6.5.25f and gcc 3.3. I understand that the SGI is not officially supported by Open MPI, but I found some advice on how to overcome several problems I was facing at http://www.open-mpi.org/community/lists/users/2006/02/0729.php. However, there's still a problem I don't know how to resolve. I tried to build and install Open MPI using the following steps: 1. setenv FC /usr/local/bin/g95 2. setenv F77 /usr/local/bin/g77 3. setenv CC /usr/local/bin/gcc 4. setenv CXX /usr/local/bin/g++ 5. ./configure --prefix=/usr/local/openmpi-1.1.2-INSTALL 6. make all After a while, the make process stops with the following error: maffinity_first_use_component.c:55: error: syntax error before ‘,’ token maffinity_first_use_component.c:61: warning: initialization makes integer from pointer without a cast make[2]: *** [maffinity_first_use_component.lo] Error 1 make[2]: Leaving directory ‘/usr/people/jshin/openmpi-1.1.2/opal/mca/maffinity/first_use’ make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory ‘/usr/people/jshin/openmpi-1.1.2/opal’ make: *** [all-recursive] Error 1 What could be causing this error? I've attached the outputs from the configure and make processes, along with my config.log file. Any help would be much appreciated! Thanks, John Shin Walter Reed Army Institute of Research Division of Molecular Pharmacology Department of Biochemistry 503 Robert Grant Ave. Silver Spring, MD 20910 Tel: 301-319-9054 ompi-output.tar.gz Description: GNU Zip compressed data
Re: [OMPI users] Trouble Building Open MPI on SGI
On Jan 25, 2007, at 1:58 PM, wrote: After a while, the make process stops with the following error: maffinity_first_use_component.c:55: error: syntax error before ‘,’ token maffinity_first_use_component.c:61: warning: initialization makes integer from pointer without a cast make[2]: *** [maffinity_first_use_component.lo] Error 1 make[2]: Leaving directory ‘/usr/people/jshin/openmpi-1.1.2/opal/ mca/maffinity/first_use’ make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory ‘/usr/people/jshin/openmpi-1.1.2/opal’ make: *** [all-recursive] Error 1 This is quite interesting -- lines 55 and 61 are part of a static structure initialization, and the #defines in there should all be for internal values. So they should all be fine. Can you send the output of preprocessing this file? Copy-n-paste the majority of the command to compile this file and remove any "-c" and "-o foo" options, and put in -E. Redirect that to a file and send it along -- I'd like to see what happens after all the #define's are resolved. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
[OMPI users] Scrambled communications using ssh starter on multiple nodes.
Recently I wanted to try OpenMPI for use with our CFD flow solver WINDUS. The code uses a master/slave methodology were the master handles I/O and issues tasks for the slaves to perform. The original parallel implementation was done in 1993 using PVM and in 1999 we added support for MPI. When testing the code with Openmpi 1.1.2 it ran fine when running on a single machine. As soon as I ran on more than one machine I started getting random errors right away (the attached tar ball has a good and bad output). It looked like either the messages were out of order or were for the other slave process. In the run mode used there is no slave to slave communication. In the file the code died near the beginning of the communication between master and slave. Sometimes it will run further before it fails. I have included a tar file with the build and configuration info. The two nodes are identical Xeon 2.8 GHZ machines running SLED 10. I am running real-time (no queue) using the ssh starter using the following appt file. -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host skipper2 -wdir /opt/scratch/m209290/ol.scr.16348 -np 1 ./__bcfdbeta.exe -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host copland -wdir /tmp/mpi.m209290 -np 2 ./__bcfdbeta.exe The above file fails but the following works: -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host skipper2 -wdir /opt/scratch/m209290/ol.scr.16348 -np 1 ./__bcfdbeta.exe -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host skipper2 -wdir /tmp/mpi.m209290 -np 2 ./__bcfdbeta.exe The first process is the master and the second two are the slaves. I am not sure what is going wrong, the code runs fine with many other MPI distributions (MPICH1/2, Intel, Scali...). I assume that either I built it wrong or am not running it properly but I cannot see what I am doing wrong. Any help would be appreciated! <> mpipb.tgz Description: mpipb.tgz
[OMPI users] Open mpi with MAC OSX on intel macs
Recently Lam Mpi released a beta version 7.1.3b that fixes bug on MAC OS X bug on intel macs for 64 builds. Does open mpi already has this bug fixed ??
Re: [OMPI users] Open mpi with MAC OSX on intel macs
Brian fixed it in Open MPI before he fixed it in LAM. :-) It'll be in both upcoming OMPI releases: v1.1.3 and v1.2. On Jan 25, 2007, at 5:51 PM, sdamjad wrote: Recently Lam Mpi released a beta version 7.1.3b that fixes bug on MAC OS X bug on intel macs for 64 builds. Does open mpi already has this bug fixed ?? ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] Open mpi with MAC OSX on intel macs
On Jan 25, 2007, at 3:51 PM, sdamjad wrote: Recently Lam Mpi released a beta version 7.1.3b that fixes bug on MAC OS X bug on intel macs for 64 builds. Does open mpi already has this bug fixed ?? The same fix was incorporated into Open MPI v1.1.3b5. Brian -- Brian Barrett Open MPI Team, CCS-1 Los Alamos National Laboratory