Re: [OMPI users] Run failure on Solaris Opteron with Sun Studio 11

2006-03-09 Thread Brian Barrett

On Mar 8, 2006, at 4:46 AM, Pierre Valiron wrote:


Sorry for the interruption. I back on mpi tracks again.

I have rebuilt openmpi-1.0.2a9 with -g and the error is unchanged.

I have also discovered that I don't need to run any openmpi  
application

to show up the error.

mpirun --help or mpirun show up the same error:
valiron@icare ~ > mpirun
*Segmentation fault (core dumped)

and
valiron@icare ~ > pstack core
core 'core' of 13842:   mpirun
 fd7ffee9dfe0 strlen () + 20
 fd7ffeef6ab3 vsprintf () + 33
 fd7fff180fd1 opal_vasprintf () + 41
 fd7fff180f88 opal_asprintf () + 98
 004098a3 orterun () + 63
 00407214 main () + 34
 0040708c  ()

Seems very basic !


It turns out this was an error in our compatibility code for asprintf 
().  We were doing something with va_list structures that Solaris  
didn't like.  I'm actually surprised that it worked on the UltraSparc  
version of Solaris, but it has been for some time for us.


Anyway, I committed a fix at r9223 on the subversion trunk - it  
should make tonight's nightly tarball for the trunk.  I've also asked  
the release managers for v1.0.2 to push the fix into that release.


Thanks for reporting the issue and for the account.  Let me know if  
you have any further problems.


Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




Re: [OMPI users] Run failure on Solaris Opteron with Sun Studio 11

2006-03-09 Thread Pierre Valiron

Brian,

Thanks for the quick night fix.
I could not find r9223 on the subversion trunk but I downloaded r9224 
instead.



- Configure and compile are okay


- However compiling the mpi.f90 takes over 35 *minutes* with -O1.  This 
seems a bit excessive...  I tried removing any -O option and things are 
just as slow. Is this behaviour related to open-mpi or to some wrong 
features of the Studio11 compiler ?



- 'mpirun --help' non longer crashes.


- standard output seems messy:

a) 'mpirun -np 4 pwd' returns randomly 1 or two lines, never 4. The same 
behaviour occurs if the output is redirected to a file.


b) When running some simple "demo" fortran code, the standard output is 
buffered within open-mpi and all results are issued at the end. No 
intermediates are showed.



- running a slightly more elaborate program fails:

a) compile behaves differently with mpif77 and mpif90.

While mpif90 compiles and builds "silently", mpif77 is talkative:

valiron@icare ~/BENCHES > mpif77 -xtarget=opteron -xarch=amd64 -o all all.f
NOTICE: Invoking /opt/Studio11/SUNWspro/bin/f90 -f77 -ftrap=%none 
-I/users/valiron/lib/openmpi-1.1a1r9224/include -xtarget=opteron 
-xarch=amd64 -o all all.f -L/users/valiron/lib/openmpi-1.1a1r9224/lib 
-lmpi -lorte -lopal -lsocket -lnsl -lrt -lm -lthread -ldl

all.f:
   rw_sched:
MAIN all:
   lam_alltoall:
   my_alltoall1:
   my_alltoall2:
   my_alltoall3:
   my_alltoall4:
   check_buf:
   alltoall_sched_ori:
   alltoall_sched_new:


b) whatever the code was compiled with mpif77 or mpif90, execution fails:

valiron@icare ~/BENCHES > mpirun -np 2 all
Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR)
Failing at addr:40
*** End of error message ***
Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR)
Failing at addr:40
*** End of error message ***

Compiling with -g adds no more information.


I attach the all.f program...  (this program was used last summer to 
discuss several strategies for alltoall over ethernet on the lammpi list).


Pierre.





Brian Barrett wrote:

On Mar 8, 2006, at 4:46 AM, Pierre Valiron wrote:

  

Sorry for the interruption. I back on mpi tracks again.

I have rebuilt openmpi-1.0.2a9 with -g and the error is unchanged.

I have also discovered that I don't need to run any openmpi  
application

to show up the error.

mpirun --help or mpirun show up the same error:
valiron@icare ~ > mpirun
*Segmentation fault (core dumped)

and
valiron@icare ~ > pstack core
core 'core' of 13842:   mpirun
 fd7ffee9dfe0 strlen () + 20
 fd7ffeef6ab3 vsprintf () + 33
 fd7fff180fd1 opal_vasprintf () + 41
 fd7fff180f88 opal_asprintf () + 98
 004098a3 orterun () + 63
 00407214 main () + 34
 0040708c  ()

Seems very basic !



It turns out this was an error in our compatibility code for asprintf 
().  We were doing something with va_list structures that Solaris  
didn't like.  I'm actually surprised that it worked on the UltraSparc  
version of Solaris, but it has been for some time for us.


Anyway, I committed a fix at r9223 on the subversion trunk - it  
should make tonight's nightly tarball for the trunk.  I've also asked  
the release managers for v1.0.2 to push the fix into that release.


Thanks for reporting the issue and for the account.  Let me know if  
you have any further problems.


Brian


  



--
Soutenez le mouvement SAUVONS LA RECHERCHE :
http://recherche-en-danger.apinc.org/

  _/_/_/_/_/   _/   Dr. Pierre VALIRON
 _/ _/   _/  _/   Laboratoire d'Astrophysique
_/ _/   _/ _/Observatoire de Grenoble / UJF
   _/_/_/_/_/_/BP 53  F-38041 Grenoble Cedex 9 (France)
  _/  _/   _/http://www-laog.obs.ujf-grenoble.fr/~valiron/
 _/  _/  _/ Mail: pierre.vali...@obs.ujf-grenoble.fr
_/  _/ _/  Phone: +33 4 7651 4787  Fax: +33 4 7644 8821
_/  _/_/





all.f.gz
Description: GNU Zip compressed data


[OMPI users] Myrinet on linux cluster

2006-03-09 Thread Tom Rosmond

Hi,

I am trying to install OPENMPI on a Linux cluster with 22 dual Opteron 
nodes

and a Myrinet interconnect.  I am having trouble with the build with the GM
libraries.  I configured with:

./configure --prefix-/users/rosmond/ompi --with-gm=/usr/lib64 
--enable-mpi2-one-sided


and the environmental variables:

setenv FC pgf90
setenv F77 pgf90
setenv CCPFLAGS /usr/include/gm   ! (note this non-standard location)

The configure seemed to go OK, but the make failed.  As you see at the 
end of the
make output, it doesn't like the format of libgm.so.  It looks to me 
that it is using
a path (/usr/lib/.) to 32 bit libraries, rather than 64 bit 
(/usr/lib64/).   Is this

correct?  What's the solution?

Tom Rosmond


config.log.bz2
Description: BZip2 compressed data


config_out.bz2
Description: BZip2 compressed data


make_out.bz2
Description: BZip2 compressed data


Re: [OMPI users] [Fwd: MPI_SEND blocks when crossing node boundary]

2006-03-09 Thread Jeff Squyres

Please note that I replied to your original post:

http://www.open-mpi.org/community/lists/users/2006/02/0712.php

Was that not sufficient?  If not, please provide more details on what  
you are attempting to do and what is occurring.  Thanks.




On Mar 7, 2006, at 2:36 PM, Cezary Sliwa wrote:


Hello again,

The problem is that MPI_SEND blocks forever (the message is still  
not delivered after many hours).


Cezary Sliwa


From: Cezary Sliwa 
Date: February 22, 2006 10:07:04 AM EST
To: us...@open-mpi.org
Subject: MPI_SEND blocks when crossing node boundary



My program runs fine with openmpi-1.0.1 when run from the command  
line (5 processes with empty host file), but when I schedule it  
with qsub to run on 2 nodes it blocks on MPI_SEND


(gdb) info stack
#0  0x0034db30c441 in __libc_sigaction () from /lib64/tls/ 
libpthread.so.0

#1  0x00573002 in opal_evsignal_recalc ()
#2  0x00582a3c in poll_dispatch ()
#3  0x005729f2 in opal_event_loop ()
#4  0x00577e68 in opal_progress ()
#5  0x004eed4a in mca_pml_ob1_send ()
#6  0x0049abdd in PMPI_Send ()
#7  0x00499dc0 in pmpi_send__ ()
#8  0x0042d5d8 in MAIN__ () at main.f:90
#9  0x005877de in main (argc=Variable "argc" is not available.
)




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/





Re: [OMPI users] MPI for DSP

2006-03-09 Thread Jeff Squyres

On Mar 6, 2006, at 10:19 PM, 赖俊杰 wrote:


hello everyone,I'm a research assistant at Tsinghua University.
And now,i begin to study the MPI for DSP.
Can anybody tell me something on this field?


If you're looking for an embedded MPI implementation, Open MPI is not  
for you.  You might want to google around for one -- I know that  
there was a commercial one for at least some period of time (have no  
idea if it still exists or not).


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/






[OMPI users] Open MPI and MultiRail InfiniBand

2006-03-09 Thread Troy Telford

I've got a machine that has the following config:

Each node has two InfiniBand ports:
 * The first port is on fabric 'a' with switches for 'a'
 * The second port is on fabric 'b' with separate switches for 'b'
 * The two fabrics are not shared ('a' and 'b' can't communicate with one  
another)


I believe that Open MPI is perfectly capable of stripeing over both fabric  
'a' and 'b', and IIRC, this is the default behavior.


Does Open MPI handle the case where Open MPI puts all of its traffic on  
the first IB port (ie. fabric 'a'), and leaves the second IB port (ie.  
fabric 'b') free for other uses (I'll use NFS as a humorous example).


If so, is there any magic required to configure it thusly?

Troy
 Telford


Re: [OMPI users] Myrinet on linux cluster

2006-03-09 Thread Troy Telford

The configure seemed to go OK, but the make failed.  As you see at the
end of the
make output, it doesn't like the format of libgm.so.  It looks to me
that it is using
a path (/usr/lib/.) to 32 bit libraries, rather than 64 bit
(/usr/lib64/).   Is this
correct?  What's the solution?


First thing's first:  Does it compile okay with gcc?

I say this because PGI's compiler has proven stubborn from time to time:   
I can compile Open MPI with PGI 6.0 just fine, but PGI 6.1 won't compile  
for me either (different reasons, though -- I posted my problem earlier  
this week).


That being said:
The distros get mixed in my mind, so I'm not sure if yours is one that:
a.)  Puts 32-bit libs in /lib32 and /usr/lib32, with 64-bit libs in /lib64  
and /usr/lib64 (and /lib points to lib64)
b.)  32-bit libs are in /lib and /usr/lib, and 64-bit are in /lib64 and  
/usr/lib64


If your machine is a 'b' then yes, it does appear to be trying (and  
failing) to use a 32-bit libgm.so


The first thing I'd do is make sure you have a 64-bit version of libgm.so;  
at least that is what I suspect.


Locate all instances of libgm.so, run 'file libgm.so' to ensure one of 'em  
is 64-bit, and that the 64-bit library is in a path where the linker can  
find it (ld.so.conf or LD_LIBRARY_PATH).

--
Troy Telford


Re: [OMPI users] Myrinet on linux cluster

2006-03-09 Thread Tom Rosmond



Troy Telford wrote:


The configure seemed to go OK, but the make failed.  As you see at the
end of the
make output, it doesn't like the format of libgm.so.  It looks to me
that it is using
a path (/usr/lib/.) to 32 bit libraries, rather than 64 bit
(/usr/lib64/).   Is this
correct?  What's the solution?
   



First thing's first:  Does it compile okay with gcc?
 

I'm not sure I understand, and besides I am strictly a Fortran guy.  
However,

I have made a successful build on this system without 'gm' support, but that
is not very interesting because its executables only run on the 
interactive node.
Therefore I don't think its a Fortran compiler problem, especially since 
there

is already an MPICH/PGI combination running on the system.

I say this because PGI's compiler has proven stubborn from time to time:   
I can compile Open MPI with PGI 6.0 just fine, but PGI 6.1 won't compile  
for me either (different reasons, though -- I posted my problem earlier  
this week).


That being said:
The distros get mixed in my mind, so I'm not sure if yours is one that:
a.)  Puts 32-bit libs in /lib32 and /usr/lib32, with 64-bit libs in /lib64  
and /usr/lib64 (and /lib points to lib64)
b.)  32-bit libs are in /lib and /usr/lib, and 64-bit are in /lib64 and  
/usr/lib64


If your machine is a 'b' then yes, it does appear to be trying (and  
failing) to use a 32-bit libgm.so
 


The answer is 'b'

The first thing I'd do is make sure you have a 64-bit version of libgm.so;  
at least that is what I suspect.


Locate all instances of libgm.so, run 'file libgm.so' to ensure one of 'em  
is 64-bit, and that the 64-bit library is in a path where the linker can  
find it (ld.so.conf or LD_LIBRARY_PATH).
 


I checked, and '/usr/lib64/libgm.so' is definitely a 64 bit library, and
I am sure that /usr/lib64 is by default in a path where the linker 
looks, since it is
a native 64 bit (Opteron) system.  Just to be sure, however, I added 
/usr/lib64 to

LD_LIBRARY_PATH, with the same results.


Re: [OMPI users] OpenMPI 1.0.x and PGI pgf90

2006-03-09 Thread Brian Barrett

On Mar 3, 2006, at 10:50 AM, Troy Telford wrote:

On Thu, 02 Mar 2006 03:55:46 -0700, Jeff Squyres mpi.org> wrote:



That being said, I have been unable to get OpenMPI to compile with
PGI 6.1
(but it does finish ./configure; it breaks during 'make').



Can you provide some details on what is going wrong?
We currently only have PGI 5.2 and 6.0 to test with.


No.  I refuse :p

Attatched is a tar.bz2 with the config.log and the output of 'make'.

I wouldn't doubt it if it's just a problem with the way I have PGI  
6.1 set up; I just haven't had time to investigate it yet.


I think I have this fixed on the trunk.  It looks like PGI tried to  
make the 6.1 compilers support GCC inline assembly, but it doesn't  
look like it's 100% correct, so for now we disabled our inline  
assembly support with PGI 6.1, so it will use the non-inlined  
version, just like the other versions of the PGI compilers.


Any tarball on the trunk after r9240 should have the fix.  I've asked  
that this gets pushed into the 1.0 branch to become part of Open MPI  
1.0.2.


Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




Re: [OMPI users] OpenMPI 1.0.x and PGI pgf90

2006-03-09 Thread Greg Lindahl
On Thu, Mar 09, 2006 at 09:13:46PM -0500, Brian Barrett wrote:

> I think I have this fixed on the trunk.  It looks like PGI tried to  
> make the 6.1 compilers support GCC inline assembly, but it doesn't  
> look like it's 100% correct,

... and that's no surprise. The spec in the gcc info pages doesn't
reflect reality, and with our compiler, I filed 20 bugs before we got
gmp (gnu multi-precision library, a heavy user of inline assembly) to
work.

Doctor, it hurts when I do this...

-- greg



Re: [OMPI users] OpenMPI 1.0.x and PGI pgf90

2006-03-09 Thread Brian Barrett

On Mar 9, 2006, at 9:28 PM, Greg Lindahl wrote:


On Thu, Mar 09, 2006 at 09:13:46PM -0500, Brian Barrett wrote:


I think I have this fixed on the trunk.  It looks like PGI tried to
make the 6.1 compilers support GCC inline assembly, but it doesn't
look like it's 100% correct,


... and that's no surprise. The spec in the gcc info pages doesn't
reflect reality, and with our compiler, I filed 20 bugs before we got
gmp (gnu multi-precision library, a heavy user of inline assembly) to
work.

Doctor, it hurts when I do this...


Yes, the inline assembly is the second least favorite part of the  
Open MPI code base for me.  And we don't even do that much  
complicated with our inline assembly (memory barriers on platforms  
that need them, spinlocks, and atomic add).  The part I found  
interesting is it's the only compiler I've run into to date where the  
C compiler handled the super-simple test properly and the C++  
compiler did not.  Oh well, it works well enough for our purposes, so  
on to more broken things.


The least favorite, of course, is the games we have to play to deal  
with free() and pinned memory caching.  But that's a different story  
altogether...



Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




Re: [OMPI users] Myrinet on linux cluster

2006-03-09 Thread Brian Barrett

On Mar 9, 2006, at 2:51 PM, Tom Rosmond wrote:

I am trying to install OPENMPI on a Linux cluster with 22 dual  
Opteron nodes
and a Myrinet interconnect.  I am having trouble with the build  
with the GM

libraries.  I configured with:

./configure --prefix-/users/rosmond/ompi --with-gm=/usr/lib64 -- 
enable-mpi2-one-sided


Can you try configuring with --with-gm (no argument) and send the  
output from configure and make again?  The --with-gm flag takes as an  
argument the installation prefix, not the library prefix.  So in this  
case, it would be --with-gm=/usr, which is kind of pointless, as  
that's a default search location anyway.  Open MPI's configure script  
should automatically look in /usr/lib64.  In fact, it looks like  
configure looked there and found the right libgm, but something went  
amuck later in the process.


Also, you really don't want to configure with the --enable-mpi2-one- 
sided flag.  It will not do anything useful and will likely cause  
very bad things to happen.  Open MPI 1.0.x does not have any MPI-2  
onesided support.  Open MPI 1.1 should have a complete implementation  
of the onesided chapter.



and the environmental variables:

setenv FC pgf90
setenv F77 pgf90
setenv CCPFLAGS /usr/include/gm   ! (note this non-standard  
location)


I assume you mean CPPFLAGS=-I/usr/include/gm, which shouldn't cause  
any problems.


The configure seemed to go OK, but the make failed.  As you see at  
the end of the
make output, it doesn't like the format of libgm.so.  It looks to  
me that it is using
a path (/usr/lib/.) to 32 bit libraries, rather than 64 bit (/ 
usr/lib64/).   Is this

correct?  What's the solution?


I'm not sure at this point, but I need a build without the incorrect  
flag to be able to determine what went wrong.  We've built Open MPI  
with 64 bit builds of GM before, so I'm surprised there were any  
problems...


Thanks,

Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




Re: [OMPI users] Open MPI and MultiRail InfiniBand

2006-03-09 Thread Brian Barrett

On Mar 9, 2006, at 6:41 PM, Troy Telford wrote:


I've got a machine that has the following config:

Each node has two InfiniBand ports:
  * The first port is on fabric 'a' with switches for 'a'
  * The second port is on fabric 'b' with separate switches for 'b'
  * The two fabrics are not shared ('a' and 'b' can't communicate  
with one

another)

I believe that Open MPI is perfectly capable of stripeing over both  
fabric

'a' and 'b', and IIRC, this is the default behavior.

Does Open MPI handle the case where Open MPI puts all of its  
traffic on

the first IB port (ie. fabric 'a'), and leaves the second IB port (ie.
fabric 'b') free for other uses (I'll use NFS as a humorous example).

If so, is there any magic required to configure it thusly?


With mvapi, we don't have the functionality in place for the user to  
specify which HCA port is used.  The user can say that at most N HCA  
ports should be used through the btl_mvapi_max_btls MCA parameter.   
So in your case, if you ran Open MPI with:


  mpirun -mca btl_mvapi_max_btls 1 -np X ./foobar

Only the first active port would be used for mvapi communication.   
I'm not sure if this is enough for your needs or not.



Hope this helps,

Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




Re: [OMPI users] Myrinet on linux cluster

2006-03-09 Thread Tom Rosmond

Attached are output files from a build with the adjustments you suggested.

setenv FC pgf90
setenv F77 pgf90
setenv CCPFLAGS -I/usr/include/gm

./configure --prefix=/users/rosmond/ompi --with-gm

The results are the same.

P.S. I understand that the mpi2 option is just a dummy. I use it because 
I am
porting a code from an SGI Origin, which has full mpi2 one-sided 
support. This

options makes it unnecessary to add my own dummy MPI2 routines to my source.
My code has both MPI1 and MPI2 message passing options, so it's one of the
reasons I like OPENMPI over MPICH.



Brian Barrett wrote:


On Mar 9, 2006, at 2:51 PM, Tom Rosmond wrote:

 

I am trying to install OPENMPI on a Linux cluster with 22 dual  
Opteron nodes
and a Myrinet interconnect.  I am having trouble with the build  
with the GM

libraries.  I configured with:

./configure --prefix-/users/rosmond/ompi --with-gm=/usr/lib64 -- 
enable-mpi2-one-sided
   



Can you try configuring with --with-gm (no argument) and send the  
output from configure and make again?  The --with-gm flag takes as an  
argument the installation prefix, not the library prefix.  So in this  
case, it would be --with-gm=/usr, which is kind of pointless, as  
that's a default search location anyway.  Open MPI's configure script  
should automatically look in /usr/lib64.  In fact, it looks like  
configure looked there and found the right libgm, but something went  
amuck later in the process.


Also, you really don't want to configure with the --enable-mpi2-one- 
sided flag.  It will not do anything useful and will likely cause  
very bad things to happen.  Open MPI 1.0.x does not have any MPI-2  
onesided support.  Open MPI 1.1 should have a complete implementation  
of the onesided chapter.


 


and the environmental variables:

setenv FC pgf90
setenv F77 pgf90
setenv CCPFLAGS /usr/include/gm   ! (note this non-standard  
location)
   



I assume you mean CPPFLAGS=-I/usr/include/gm, which shouldn't cause  
any problems.


 

The configure seemed to go OK, but the make failed.  As you see at  
the end of the
make output, it doesn't like the format of libgm.so.  It looks to  
me that it is using
a path (/usr/lib/.) to 32 bit libraries, rather than 64 bit (/ 
usr/lib64/).   Is this

correct?  What's the solution?
   



I'm not sure at this point, but I need a build without the incorrect  
flag to be able to determine what went wrong.  We've built Open MPI  
with 64 bit builds of GM before, so I'm surprised there were any  
problems...


Thanks,

Brian


 



config.log.bz2
Description: BZip2 compressed data


config_out.bz2
Description: BZip2 compressed data


makeall_out.bz2
Description: BZip2 compressed data