[OMPI users] ADIOI Lock problems on NFS and Panasas with OpenMPI
Hello everyone, OpenMPI crashes when doing parallel HDF5 on both NFS and Panasas systems: On NFS, we are getting: ADIOI_Set_lock:: No locks available ADIOI_Set_lock:offset 69744, length 256 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 124 File locking failed in ADIOI_Set_lock(fd 25,cmd F_SETLKW/7,type F_WRLCK/1,whence 0) with return value and errno 25. If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching). On Panasas: ADIOI_PANFS_RESIZE: Rank 13: Resize failed: requested=46996328 actual=9187464. We are using intel 12.1.4, openMPI (tried all new versions) and HDF5 1.8.10. I searched the forum archives and found two identical questions from 2008. One was left unanswered, and the other answered by Jeff (Squyres) suggesting checking with ROMIO maintainers. So, I went ahead and recompiled everything without romio, but I am still seeing the same errors. Any suggestions will be very much appreciated! Thanks, -Mehmet
Re: [OMPI users] ierr vs ierror in F90 mpi module
On Apr 25, 2013, at 9:50 AM, W Spector wrote: > I just downloaded 1.7.1. The new files in the use-mpi-f08 look great! > > However the use-mpi-tkr and use-mpi-ignore-tkr directories don't fare so > well. Literally all the interfaces are still 'ierr'. Oy. I probably should have realized that before I sent it to you... > While I realize that both the F90 mpi module and interface checking, were > optional prior to MPI 3.0, the final argument has been called 'ierror' since > MPI 1! This really should be fixed. Will do. I just talked to my Fortran partner in crime on the OMPI project (Craig Rasmussen), and he agreed to add it to his to-do list. It might take a little time to get done, but I'll open a bug on it so that it's not forgotten. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] ierr vs ierror in F90 mpi module
On Apr 25, 2013, at 10:52 PM, W Spector wrote: > I tried building 1.7.1 on my Ubuntu system. The default gfortran is v4.6.3, > so configure won't enable the mpi_f08 module build. I also tried a three > week old snapshot of the gfortran 4.9 trunk. This has Tobias's new TYPE(*) > in it, but not his latest !GCC$ attributes NO_ARG_CHECK stuff. However > configure still won't enable the mpi_f08 module. Yes, Tobias has mailed me+Craig about his latest stuff and I haven't had a chance to incorporate it yet. It's on the to-do list, though... > Is there a trick to getting a recent gfortran to compile the mpi_f08 module? Not yet. Hopefully soon. > I went into the openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts directory > and modified the files to use ierror instead of ierr. (One well-crafted line > of shell script.) Did the same with a couple of .h.in files in the > use-mpi-tkr and use-mpi-ignore-tkr directories, and > use-mpi-tkr/attr_fn-f90-interfaces.h.in. (One editor command each.) > > With the above, the mpi module is in much better shape. However there are > still some scattered incorrect non-ierror argument names. A few examples > from the code I am working with: > > MPI_Type_create_struct: The 2nd argument should be "array_of_blocklengths", > instead of "array_of_block_lengths" > > MPI_Type_commit: "datatype" instead of "type" > > MPI_Type_free: Again, "datatype" instead of "type" > > There are more... Cool. Any chance you could send us a patch? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] ierr vs ierror in F90 mpi module
Hi Jeff, To take care of the ierr->ierror conversion, simply do the following: cd openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts ls -1 *.sh | xargs -i -t ex -c ":1,\$s?ierr?ierror?" -c ":wq" {} Then go up a level to openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tk and use: cd .. ls -1 fort*.in | xargs -i -t ex -c ":1,\$s?ierr?ierror?" -c ":wq" {} Last, the use-mpi-ignore-tkr directory: cd ../use-mpi-ignore-tkr ls -1 mpi*.in | xargs -i -t ex -c ":1,\$s?ierr?ierror?" -c ":wq" {} As you can tell from the below, I needed to use a few MPI_Type calls. So fixed the few that I needed in the openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts directory. I didn't exhaustively go through and verify every interface in the whole MPI library. Walter On 04/26/2013 11:53 AM, Jeff Squyres (jsquyres) wrote: On Apr 25, 2013, at 10:52 PM, W Spector wrote: ... I went into the openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts directory and modified the files to use ierror instead of ierr. (One well-crafted line of shell script.) Did the same with a couple of .h.in files in the use-mpi-tkr and use-mpi-ignore-tkr directories, and use-mpi-tkr/attr_fn-f90-interfaces.h.in. (One editor command each.) With the above, the mpi module is in much better shape. However there are still some scattered incorrect non-ierror argument names. A few examples from the code I am working with: MPI_Type_create_struct: The 2nd argument should be "array_of_blocklengths", instead of "array_of_block_lengths" MPI_Type_commit: "datatype" instead of "type" MPI_Type_free: Again, "datatype" instead of "type" There are more... Cool. Any chance you could send us a patch?
Re: [OMPI users] ierr vs ierror in F90 mpi module
I committed that part; thanks. On Apr 26, 2013, at 5:51 PM, W Spector wrote: > Hi Jeff, > > To take care of the ierr->ierror conversion, simply do the following: > > cd openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts > ls -1 *.sh | xargs -i -t ex -c ":1,\$s?ierr?ierror?" -c ":wq" {} > > Then go up a level to openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tk and use: > > cd .. > ls -1 fort*.in | xargs -i -t ex -c ":1,\$s?ierr?ierror?" -c ":wq" {} > > Last, the use-mpi-ignore-tkr directory: > > cd ../use-mpi-ignore-tkr > ls -1 mpi*.in | xargs -i -t ex -c ":1,\$s?ierr?ierror?" -c ":wq" {} > > As you can tell from the below, I needed to use a few MPI_Type calls. So > fixed the few that I needed in the > openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts directory. I didn't > exhaustively go through and verify every interface in the whole MPI library. > > Walter > > On 04/26/2013 11:53 AM, Jeff Squyres (jsquyres) wrote: >> On Apr 25, 2013, at 10:52 PM, W Spector wrote: >> ... >>> I went into the openmpi-1.7.1/ompi/mpi/fortran/use-mpi-tkr/scripts >>> directory and modified the files to use ierror instead of ierr. (One >>> well-crafted line of shell script.) Did the same with a couple of .h.in >>> files in the use-mpi-tkr and use-mpi-ignore-tkr directories, and >>> use-mpi-tkr/attr_fn-f90-interfaces.h.in. (One editor command each.) >>> >>> With the above, the mpi module is in much better shape. However there are >>> still some scattered incorrect non-ierror argument names. A few examples >>> from the code I am working with: >>> >>> MPI_Type_create_struct: The 2nd argument should be >>> "array_of_blocklengths", instead of "array_of_block_lengths" >>> >>> MPI_Type_commit: "datatype" instead of "type" >>> >>> MPI_Type_free: Again, "datatype" instead of "type" >>> >>> There are more... >> >> Cool. Any chance you could send us a patch? >> > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI users] Strange "All-to-All" behavior
Hi, I have encountered really bad performance when all the nodes send data to all the other nodes. I use Isend and Irecv with multiple outstanding sends per node. I debugged the behavior and came to the following conclusion: It seems that one sender locks out all other senders for one receiver. This sender releases the receiver only when there are no more sends posted or a node with lower rank, wants to send to this node (deadlock prevention). As a consequence, node 0 sends all its data to all nodes, while all others are waiting, then node 1 sends all the data, … What is the rationale behind this behaviour and can I change it by some MCA parameter? Thanks Stephan