Ok, I think we have this resolved in trunk and the fix will go into 1.7.4. The check for MPI_IN_PLACE was wrong in the mpif-h bindings. The fix was tested with your reproducer. Both MPI_SCATTER and MPI_SCATTERV had this bug. The bug does not exist in 1.6.x though so I don't know why it was failing there.
I don't see a problem with MPI_GATHER or MPI_GATHERV though. Can you send a reproducer for those? -Nathan Hjelm HPC-3, LANL On Tue, Oct 22, 2013 at 02:28:38PM +0000, Gerlach, Charles A. wrote: > My reproducer is below (SCATTERV only). It needs to be compiled with 64-bit > default reals, and I'm running on four cores of a single linux86-64 box > running SLED 12.3 (except where noted). > > Using Open-MPI with different compilers: > > With g95: The non-root procs print the correct values, but the root process > seg faults somewhere inside the SCATTERV call. > With portland: I get: -1614907703: __hpf_esend: not implemented > (All procs print out the correct values.) > With Intel (on a Mac Pro): Complains about a null communicator in > MPI_FINALIZE and crashes. All procs print out the correct values. > > With all three of these compilers, if I comment out the entire IF (MYPN.EQ.0) > code so that all procs pass RARR1 into both the send and recv buffers, I get > no errors. > > With gfortran: This works either way (with IN_PLACE or without). > > Other MPI implementations: > > With MPICH2 (any compiler) and Intel Visual Fortran on Windows, the IN_PLACE > code works. > They specifically prohibit passing RARR1 into both the send and recv buffers > on the root proc. > > Reproducer: > > PROGRAM MAIN > > IMPLICIT NONE > > REAL, DIMENSION(1200) :: RARR1 > INTEGER, DIMENSION(4) :: SEND_NUM, SEND_OFF > INTEGER :: RECV_NUM, MYPN, NPES, IERR > > INTEGER :: I, J > > INCLUDE 'mpif.h' > > SEND_NUM = (/ 300, 300, 300, 300 /) > SEND_OFF = (/ 0, 300, 600, 900 /) > RECV_NUM = 300 > > CALL MPI_INIT(IERR) > > CALL MPI_COMM_SIZE(MPI_COMM_WORLD, NPES, IERR) > CALL MPI_COMM_RANK(MPI_COMM_WORLD, MYPN, IERR) > > IF (MYPN.EQ.0) THEN > DO I = 1,1200 > RARR1(I) = 0.001*I > ENDDO > ELSE > RARR1 = 0.0 > ENDIF > > IF (MYPN.EQ.0) THEN > CALL MPI_SCATTERV(RARR1,SEND_NUM,SEND_OFF,MPI_DOUBLE_PRECISION, & > MPI_IN_PLACE,RECV_NUM,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,IERR) > ELSE > CALL MPI_SCATTERV(RARR1,SEND_NUM,SEND_OFF,MPI_DOUBLE_PRECISION, & > RARR1,RECV_NUM,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,IERR) > ENDIF > > OPEN(71+MYPN,FORM='FORMATTED',POSITION='APPEND') > WRITE(71+MYPN,'(3E15.7)') RARR1(1:300) > CLOSE(71+MYPN) > > CALL MPI_FINALIZE(IERR) > > END PROGRAM MAIN > > > ________________________________________ > From: users [users-boun...@open-mpi.org] on behalf of Nathan Hjelm > [hje...@lanl.gov] > Sent: Wednesday, October 09, 2013 12:37 PM > To: Open MPI Users > Subject: Re: [OMPI users] MPI_IN_PLACE with GATHERV, AGATHERV, and SCATERV > > These functions are tested nightly and there has been no indication any of > these > functions fail with MPI_IN_PLACE. Can you provide a reproducer? > > -Nathan > HPC-3, LANL > > On Tue, Oct 08, 2013 at 07:40:50PM +0000, Gerlach, Charles A. wrote: > > I have an MPI code that was developed using MPICH1 and OpenMPI before the > > MPI2 standards became commonplace (before MPI_IN_PLACE was an option). > > > > > > > > So, my code has many examples of GATHERV, AGATHERV and SCATTERV, where I > > pass the same array in as the SEND_BUF and the RECV_BUF, and this has > > worked fine for many years. > > > > > > > > Intel MPI and MPICH2 explicitly disallow this behavior according to the > > MPI2 standard. So, I have gone through and used MPI_IN_PLACE for all the > > GATHERV/SCATTERVs that used to pass the same array twice. This code now > > works with MPICH2 and Intel_MPI, but fails with OpenMPI-1.6.5 on multiple > > platforms and compilers. > > > > > > > > PLATFORM COMPILER SUCCESS? (For at least one > > simple example) > > > > ------------------------------------------------------------ > > > > SLED 12.3 (x86-64) - Portland group - fails > > > > SLED 12.3 (x86-64) - g95 - fails > > > > SLED 12.3 (x86-64) - gfortran - works > > > > > > > > OS X 10.8 -- intel -fails > > > > > > > > > > > > In every case where OpenMPI fails with the MPI_IN_PLACE code, I can go > > back to the original code that passes the same array twice instead of > > using MPI_IN_PLACE, and it is fine. > > > > > > > > I have made a test case doing an individual GATHERV with MPI_IN_PLACE, > > and > > it works with OpenMPI. So it looks like there is some interaction with > > my > > code that is causing the problem. I have no idea how to go about trying > > to > > debug it. > > > > > > > > > > > > In summary: > > > > > > > > OpenMPI-1.6.5 crashes my code when I use GATHERV, AGATHERV, and SCATTERV > > with MPI_IN_PLACE. > > > > Intel MPI and MPICH2 work with my code when I use GATHERV, AGATHERV, and > > SCATTERV with MPI_IN_PLACE. > > > > > > > > OpenMPI-1.6.5 works with my code when I pass the same array to SEND_BUF > > and RECV_BUF instead of using MPI_IN_PLACE for those same GATHERV, > > AGATHERV, and SCATTERVs. > > > > > > > > > > > > -Charles > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users