[OMPI users] MPI comparison on openib

2007-01-25 Thread Peter Kjellstrom
Hello

We have been busy this week comparing five different MPI-implementations on a 
small test cluster. Several notable differences have been observed but I will 
limit myself to one perticular test in this e-mail (64-rank Intel MPI 
Benchmark alltoall on 8 dual quad nodes).

Lets start with the hardware and software conditions:
Hardware: 16 nodes (8 used for this test) each with two Clovertown cpus 
(X5355/2.66GHz, quad-core) and 16G RAM. Interconnected with IB 4x SDR on 
PCI-express (MT25208).
Software: Centos-4.3 x86_64 2.6.9-34.0.2smp with OFED-1.1 and intel compilers 
9.1.04x

MPIs tested: OpenMPI-1.1.3b4, OpenMPI-1.2b3, MVAPICH-0.9.8, MVAPICH2-0.9.8 and 
ScaMPI-3.10.4 (ScaMPI is a commercial mpi from Scali).

Main question to the OpenMPI developers: why does OpenMPI behave so badly 
between approx. 10 and 1000 bytes?

Plot:
 http://www.nsc.liu.se/~cap/all2all_64pe_clover.png
Notes:
* The OpenMPI run tagged 'basic' was done with "-mca coll self,sm,basic" all 
other runs were done with whatever setting is the default.
* Both x- and y-axis is log scaled. The y-axis labels are a bit hard to read 
but the first "5." is 50us, the 2nd 500us and so on.

ompi_info:
 http://www.nsc.liu.se/~cap/openmpi-1.1.3b4-intel91.info
 http://www.nsc.liu.se/~cap/openmpi-1.2b3-intel91.info

Best Regards,
 Peter K

-- 

  Peter Kjellström
  National Supercomputer Centre, Linköping Sweden


pgpFFP0IesVyC.pgp
Description: PGP signature


[OMPI users] Trouble Building Open MPI on SGI

2007-01-25 Thread john.shin1
I'm trying to install Open MPI 1.1.2 on an Octane 2 SGI machine running IRIX 
6.5.25f and gcc 3.3.  I understand that the SGI is not officially supported by 
Open MPI, but I found some advice on how to overcome several problems I was 
facing at http://www.open-mpi.org/community/lists/users/2006/02/0729.php.  
However, there's still a problem I don't know how to resolve.  I tried to build 
and install Open MPI using the following steps:

1.  setenv FC /usr/local/bin/g95
2.  setenv F77 /usr/local/bin/g77
3.  setenv CC /usr/local/bin/gcc
4.  setenv CXX /usr/local/bin/g++
5.  ./configure --prefix=/usr/local/openmpi-1.1.2-INSTALL
6.  make all

After a while, the make process stops with the following error:

maffinity_first_use_component.c:55: error: syntax error before ‘,’ token
maffinity_first_use_component.c:61: warning: initialization makes integer from 
pointer without a cast
make[2]: *** [maffinity_first_use_component.lo] Error 1
make[2]: Leaving directory 
‘/usr/people/jshin/openmpi-1.1.2/opal/mca/maffinity/first_use’
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory ‘/usr/people/jshin/openmpi-1.1.2/opal’
make: *** [all-recursive] Error 1

What could be causing this error?  I've attached the outputs from the configure 
and make processes, along with my config.log file.  Any help would be much 
appreciated!

Thanks,

John Shin

Walter Reed Army Institute of Research
Division of Molecular Pharmacology
Department of Biochemistry
503 Robert Grant Ave.
Silver Spring, MD 20910
Tel: 301-319-9054



ompi-output.tar.gz
Description: GNU Zip compressed data


Re: [OMPI users] Trouble Building Open MPI on SGI

2007-01-25 Thread Jeff Squyres

On Jan 25, 2007, at 1:58 PM,  wrote:


After a while, the make process stops with the following error:

maffinity_first_use_component.c:55: error: syntax error before ‘,’  
token
maffinity_first_use_component.c:61: warning: initialization makes  
integer from pointer without a cast

make[2]: *** [maffinity_first_use_component.lo] Error 1
make[2]: Leaving directory ‘/usr/people/jshin/openmpi-1.1.2/opal/ 
mca/maffinity/first_use’

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory ‘/usr/people/jshin/openmpi-1.1.2/opal’
make: *** [all-recursive] Error 1


This is quite interesting -- lines 55 and 61 are part of a static  
structure initialization, and the #defines in there should all be for  
internal values.  So they should all be fine.


Can you send the output of preprocessing this file?  Copy-n-paste the  
majority of the command to compile this file and remove any "-c" and  
"-o foo" options, and put in -E.  Redirect that to a file and send it  
along -- I'd like to see what happens after all the #define's are  
resolved.


--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems




[OMPI users] Scrambled communications using ssh starter on multiple nodes.

2007-01-25 Thread Fisher, Mark S
Recently I wanted to try OpenMPI for use with our CFD flow solver
WINDUS. The code uses a master/slave methodology were the master handles
I/O and issues tasks for the slaves to perform. The original parallel
implementation was done in 1993 using PVM and in 1999 we added support
for MPI.

When testing the code with Openmpi 1.1.2 it ran fine when running on a
single machine. As soon as I ran on more than one machine I started
getting random errors right away (the attached tar ball has a good and
bad output). It looked like either the messages were out of order or
were for the other slave process. In the run mode used there is no slave
to slave communication. In the file the code died near the beginning of
the communication between master and slave. Sometimes it will run
further before it fails. 

I have included a tar file with the build and configuration info. The
two nodes are identical Xeon 2.8 GHZ machines running SLED 10. I am
running real-time (no queue) using the ssh starter using the following
appt file.

-x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host
skipper2  -wdir /opt/scratch/m209290/ol.scr.16348 -np 1 ./__bcfdbeta.exe
-x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host
copland -wdir /tmp/mpi.m209290 -np 2 ./__bcfdbeta.exe

The above file fails but the following works:

-x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host
skipper2  -wdir /opt/scratch/m209290/ol.scr.16348 -np 1 ./__bcfdbeta.exe
-x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host
skipper2 -wdir /tmp/mpi.m209290 -np 2 ./__bcfdbeta.exe

The first process is the master and the second two are the slaves. I am
not sure what is going wrong, the code runs fine with many other MPI
distributions (MPICH1/2, Intel, Scali...). I assume that either I built
it wrong or am not running it properly but I cannot see what I am doing
wrong. Any help would be appreciated!

 <> 


mpipb.tgz
Description: mpipb.tgz


[OMPI users] Open mpi with MAC OSX on intel macs

2007-01-25 Thread sdamjad
Recently Lam Mpi released a beta version 7.1.3b that fixes bug on MAC OS X bug 
on intel macs for 64 
builds.

Does open mpi already has this bug fixed ??



Re: [OMPI users] Open mpi with MAC OSX on intel macs

2007-01-25 Thread Jeff Squyres

Brian fixed it in Open MPI before he fixed it in LAM.  :-)

It'll be in both upcoming OMPI releases: v1.1.3 and v1.2.



On Jan 25, 2007, at 5:51 PM, sdamjad wrote:

Recently Lam Mpi released a beta version 7.1.3b that fixes bug on  
MAC OS X bug

on intel macs for 64
builds.

Does open mpi already has this bug fixed ??

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems



Re: [OMPI users] Open mpi with MAC OSX on intel macs

2007-01-25 Thread Brian W. Barrett

On Jan 25, 2007, at 3:51 PM, sdamjad wrote:

Recently Lam Mpi released a beta version 7.1.3b that fixes bug on  
MAC OS X bug

on intel macs for 64
builds.

Does open mpi already has this bug fixed ??



The same fix was incorporated into Open MPI v1.1.3b5.

Brian

--
  Brian Barrett
  Open MPI Team, CCS-1
  Los Alamos National Laboratory