[OMPI users] ADIOI Lock problems on NFS and Panasas with OpenMPI

2013-04-26 Thread Mehmet Belgin
Hello everyone,

OpenMPI crashes when doing parallel HDF5 on both NFS and Panasas systems:

On NFS, we are getting:

ADIOI_Set_lock:: No locks available
ADIOI_Set_lock:offset 69744, length 256
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 124
File locking failed in ADIOI_Set_lock(fd 25,cmd F_SETLKW/7,type 
F_WRLCK/1,whence 0) with return value  and errno 25.
If the file system is NFS, you need to use NFS version 3, ensure that the lockd 
daemon is running on all the machines, and mount the directory with the 'noac' 
option (no attribute caching).

On Panasas:

ADIOI_PANFS_RESIZE: Rank 13: Resize failed: requested=46996328 actual=9187464.

We are using intel 12.1.4, openMPI (tried all new versions) and HDF5 1.8.10. 

I searched the forum archives and found two identical questions from 2008. One 
was left unanswered, and the other answered by Jeff (Squyres) suggesting 
checking with ROMIO maintainers. So, I went ahead and recompiled everything 
without romio, but I am still seeing the same errors.

Any suggestions will be very much appreciated!

Thanks,
-Mehmet




Re: [OMPI users] ierr vs ierror in F90 mpi module

2013-04-26 Thread Jeff Squyres (jsquyres)
On Apr 25, 2013, at 9:50 AM, W Spector  wrote:

> I just downloaded 1.7.1.  The new files in the use-mpi-f08 look great!
> 
> However the use-mpi-tkr and use-mpi-ignore-tkr directories don't fare so 
> well.  Literally all the interfaces are still 'ierr'.

Oy.  I probably should have realized that before I sent it to you...

> While I realize that both the F90 mpi module and interface checking, were 
> optional prior to MPI 3.0, the final argument has been called 'ierror' since 
> MPI 1!  This really should be fixed.

Will do.  I just talked to my Fortran partner in crime on the OMPI project 
(Craig Rasmussen), and he agreed to add it to his to-do list.  It might take a 
little time to get done, but I'll open a bug on it so that it's not forgotten.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] ierr vs ierror in F90 mpi module

2013-04-26 Thread Jeff Squyres (jsquyres)
On Apr 25, 2013, at 10:52 PM, W Spector  wrote:

> I tried building 1.7.1 on my Ubuntu system.  The default gfortran is v4.6.3, 
> so configure won't enable the mpi_f08 module build.  I also tried a three 
> week old snapshot of the gfortran 4.9 trunk.  This has Tobias's new TYPE(*) 
> in it, but not his latest !GCC$ attributes NO_ARG_CHECK stuff.  However 
> configure still won't enable the mpi_f08 module.

Yes, Tobias has mailed me+Craig about his latest stuff and I haven't had a 
chance to incorporate it yet.  It's on the to-do list, though...

> Is there a trick to getting a recent gfortran to compile the mpi_f08 module?

Not yet.  Hopefully soon.

> I went into the openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts directory 
> and modified the files to use ierror instead of ierr.  (One well-crafted line 
> of shell script.)  Did the same with a couple of .h.in files in the 
> use-mpi-tkr and use-mpi-ignore-tkr directories, and 
> use-mpi-tkr/attr_fn-f90-interfaces.h.in.  (One editor command each.)
> 
> With the above, the mpi module is in much better shape.  However there are 
> still some scattered incorrect non-ierror argument names.  A few examples 
> from the code I am working with:
> 
>  MPI_Type_create_struct: The 2nd argument should be "array_of_blocklengths", 
> instead of "array_of_block_lengths"
> 
>  MPI_Type_commit: "datatype" instead of "type"
> 
>  MPI_Type_free: Again, "datatype" instead of "type"
> 
> There are more...

Cool.  Any chance you could send us a patch?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] ierr vs ierror in F90 mpi module

2013-04-26 Thread W Spector

Hi Jeff,

To take care of the ierr->ierror conversion, simply do the following:

  cd openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts
  ls -1 *.sh | xargs -i -t ex -c ":1,\$s?ierr?ierror?" -c ":wq" {}

Then go up a level to openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tk and use:

  cd ..
  ls -1 fort*.in | xargs -i -t ex -c ":1,\$s?ierr?ierror?" -c ":wq" {}

Last, the use-mpi-ignore-tkr directory:

  cd ../use-mpi-ignore-tkr
  ls -1 mpi*.in | xargs -i -t ex -c ":1,\$s?ierr?ierror?" -c ":wq" {}

As you can tell from the below, I needed to use a few MPI_Type calls. 
So fixed the few that I needed in the 
openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts directory.  I didn't 
exhaustively go through and verify every interface in the whole MPI library.


Walter

On 04/26/2013 11:53 AM, Jeff Squyres (jsquyres) wrote:

On Apr 25, 2013, at 10:52 PM, W Spector  wrote:
...

I went into the openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts directory 
and modified the files to use ierror instead of ierr.  (One well-crafted line 
of shell script.)  Did the same with a couple of .h.in files in the use-mpi-tkr 
and use-mpi-ignore-tkr directories, and 
use-mpi-tkr/attr_fn-f90-interfaces.h.in.  (One editor command each.)

With the above, the mpi module is in much better shape.  However there are 
still some scattered incorrect non-ierror argument names.  A few examples from 
the code I am working with:

  MPI_Type_create_struct: The 2nd argument should be "array_of_blocklengths", instead of 
"array_of_block_lengths"

  MPI_Type_commit: "datatype" instead of "type"

  MPI_Type_free: Again, "datatype" instead of "type"

There are more...


Cool.  Any chance you could send us a patch?



Re: [OMPI users] ierr vs ierror in F90 mpi module

2013-04-26 Thread Jeff Squyres (jsquyres)
I committed that part; thanks.

On Apr 26, 2013, at 5:51 PM, W Spector  wrote:

> Hi Jeff,
> 
> To take care of the ierr->ierror conversion, simply do the following:
> 
>  cd openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts
>  ls -1 *.sh | xargs -i -t ex -c ":1,\$s?ierr?ierror?" -c ":wq" {}
> 
> Then go up a level to openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tk and use:
> 
>  cd ..
>  ls -1 fort*.in | xargs -i -t ex -c ":1,\$s?ierr?ierror?" -c ":wq" {}
> 
> Last, the use-mpi-ignore-tkr directory:
> 
>  cd ../use-mpi-ignore-tkr
>  ls -1 mpi*.in | xargs -i -t ex -c ":1,\$s?ierr?ierror?" -c ":wq" {}
> 
> As you can tell from the below, I needed to use a few MPI_Type calls. So 
> fixed the few that I needed in the 
> openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts directory.  I didn't 
> exhaustively go through and verify every interface in the whole MPI library.
> 
> Walter
> 
> On 04/26/2013 11:53 AM, Jeff Squyres (jsquyres) wrote:
>> On Apr 25, 2013, at 10:52 PM, W Spector  wrote:
>> ...
>>> I went into the openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts 
>>> directory and modified the files to use ierror instead of ierr.  (One 
>>> well-crafted line of shell script.)  Did the same with a couple of .h.in 
>>> files in the use-mpi-tkr and use-mpi-ignore-tkr directories, and 
>>> use-mpi-tkr/attr_fn-f90-interfaces.h.in.  (One editor command each.)
>>> 
>>> With the above, the mpi module is in much better shape.  However there are 
>>> still some scattered incorrect non-ierror argument names.  A few examples 
>>> from the code I am working with:
>>> 
>>>  MPI_Type_create_struct: The 2nd argument should be 
>>> "array_of_blocklengths", instead of "array_of_block_lengths"
>>> 
>>>  MPI_Type_commit: "datatype" instead of "type"
>>> 
>>>  MPI_Type_free: Again, "datatype" instead of "type"
>>> 
>>> There are more...
>> 
>> Cool.  Any chance you could send us a patch?
>> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] Strange "All-to-All" behavior

2013-04-26 Thread Stephan Wolf
Hi,

I have encountered really bad performance when all the nodes send data
to all the other nodes. I use Isend and Irecv with multiple
outstanding sends per node. I debugged the behavior and came to the
following conclusion: It seems that one sender locks out all other
senders for one receiver. This sender releases the receiver only when
there are no more sends posted or a node with lower rank, wants to
send to this node (deadlock prevention). As a consequence, node 0
sends all its data to all nodes, while all others are waiting, then
node 1 sends all the data, …

What is the rationale behind this behaviour and can I change it by
some MCA parameter?

Thanks

Stephan