Ok, I think we have this resolved in trunk and the fix will go into 1.7.4. The
check for MPI_IN_PLACE was wrong in the mpif-h bindings. The fix was tested with
your reproducer. Both MPI_SCATTER and MPI_SCATTERV had this bug. The bug does 
not
exist in 1.6.x though so I don't know why it was failing there.

I don't see a problem with MPI_GATHER or MPI_GATHERV though. Can you send a
reproducer for those?

-Nathan Hjelm
HPC-3, LANL

On Tue, Oct 22, 2013 at 02:28:38PM +0000, Gerlach, Charles A. wrote:
> My reproducer is below (SCATTERV only). It needs to be compiled with 64-bit 
> default reals, and I'm running on four cores of a single linux86-64 box 
> running SLED 12.3 (except where noted). 
> 
> Using Open-MPI with different compilers:
> 
> With g95: The non-root procs print the correct values, but the root process 
> seg faults somewhere inside the SCATTERV call.
> With portland: I get: -1614907703: __hpf_esend: not implemented
>                      (All procs print out the correct values.)
> With Intel (on a Mac Pro): Complains about a null communicator in 
> MPI_FINALIZE and crashes. All procs print out the correct values.
> 
> With all three of these compilers, if I comment out the entire IF (MYPN.EQ.0) 
> code so that all procs pass RARR1 into both the send and recv buffers, I get 
> no errors.
> 
> With gfortran: This works either way (with IN_PLACE or without).
> 
> Other MPI implementations:
> 
> With MPICH2 (any compiler) and Intel Visual Fortran on Windows, the IN_PLACE 
> code works. 
> They specifically prohibit passing RARR1 into both the send and recv buffers 
> on the root proc. 
> 
> Reproducer:
> 
> PROGRAM MAIN
> 
>   IMPLICIT NONE
> 
>   REAL, DIMENSION(1200)  :: RARR1
>   INTEGER, DIMENSION(4) :: SEND_NUM, SEND_OFF
>   INTEGER :: RECV_NUM, MYPN, NPES, IERR
> 
>   INTEGER :: I, J
> 
>   INCLUDE 'mpif.h'
> 
>   SEND_NUM = (/ 300, 300, 300, 300 /)
>   SEND_OFF = (/ 0, 300, 600, 900 /)
>   RECV_NUM = 300
> 
>   CALL MPI_INIT(IERR)
> 
>   CALL MPI_COMM_SIZE(MPI_COMM_WORLD, NPES, IERR)
>   CALL MPI_COMM_RANK(MPI_COMM_WORLD, MYPN, IERR)
> 
>   IF (MYPN.EQ.0) THEN
>      DO I = 1,1200
>         RARR1(I) = 0.001*I
>      ENDDO
>   ELSE
>      RARR1 = 0.0
>   ENDIF
> 
>   IF (MYPN.EQ.0) THEN
>      CALL MPI_SCATTERV(RARR1,SEND_NUM,SEND_OFF,MPI_DOUBLE_PRECISION, &
>           MPI_IN_PLACE,RECV_NUM,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,IERR)
>   ELSE
>      CALL MPI_SCATTERV(RARR1,SEND_NUM,SEND_OFF,MPI_DOUBLE_PRECISION, &
>           RARR1,RECV_NUM,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,IERR)
>   ENDIF
> 
>   OPEN(71+MYPN,FORM='FORMATTED',POSITION='APPEND')
>   WRITE(71+MYPN,'(3E15.7)') RARR1(1:300)
>   CLOSE(71+MYPN)
> 
>   CALL MPI_FINALIZE(IERR)
> 
> END PROGRAM MAIN
> 
> 
> ________________________________________
> From: users [users-boun...@open-mpi.org] on behalf of Nathan Hjelm 
> [hje...@lanl.gov]
> Sent: Wednesday, October 09, 2013 12:37 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] MPI_IN_PLACE with GATHERV, AGATHERV, and SCATERV
> 
> These functions are tested nightly and there has been no indication any of 
> these
> functions fail with MPI_IN_PLACE. Can you provide a reproducer?
> 
> -Nathan
> HPC-3, LANL
> 
> On Tue, Oct 08, 2013 at 07:40:50PM +0000, Gerlach, Charles A. wrote:
> >    I have an MPI code that was developed using MPICH1 and OpenMPI before the
> >    MPI2 standards became commonplace (before MPI_IN_PLACE was an option).
> >
> >
> >
> >    So, my code has many examples of GATHERV, AGATHERV and SCATTERV, where I
> >    pass the same array in as the SEND_BUF and the RECV_BUF, and this has
> >    worked fine for many years.
> >
> >
> >
> >    Intel MPI and MPICH2 explicitly disallow this behavior according to the
> >    MPI2 standard. So, I have gone through and used MPI_IN_PLACE for all the
> >    GATHERV/SCATTERVs that used to pass the same array twice. This code now
> >    works with MPICH2 and Intel_MPI, but fails with OpenMPI-1.6.5 on multiple
> >    platforms and compilers.
> >
> >
> >
> >    PLATFORM                  COMPILER            SUCCESS? (For at least one
> >    simple example)
> >
> >    ------------------------------------------------------------
> >
> >    SLED 12.3 (x86-64) - Portland group  - fails
> >
> >    SLED 12.3 (x86-64) - g95                         - fails
> >
> >    SLED 12.3 (x86-64) - gfortran               - works
> >
> >
> >
> >    OS X 10.8                 -- intel                        -fails
> >
> >
> >
> >
> >
> >    In every case where OpenMPI fails with the MPI_IN_PLACE code, I can go
> >    back to the original code that passes the same array twice instead of
> >    using MPI_IN_PLACE, and it is fine.
> >
> >
> >
> >    I have made a test case doing an individual GATHERV with MPI_IN_PLACE, 
> > and
> >    it works with OpenMPI.  So it looks like there is some interaction with 
> > my
> >    code that is causing the problem. I have no idea how to go about trying 
> > to
> >    debug it.
> >
> >
> >
> >
> >
> >    In summary:
> >
> >
> >
> >    OpenMPI-1.6.5 crashes my code when I use GATHERV, AGATHERV, and SCATTERV
> >    with MPI_IN_PLACE.
> >
> >    Intel MPI and MPICH2 work with my code when I use GATHERV, AGATHERV, and
> >    SCATTERV with MPI_IN_PLACE.
> >
> >
> >
> >    OpenMPI-1.6.5 works with my code when I pass the same array to SEND_BUF
> >    and RECV_BUF instead of using MPI_IN_PLACE for those same GATHERV,
> >    AGATHERV, and SCATTERVs.
> >
> >
> >
> >
> >
> >    -Charles
> 
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to