Hello,

Yes, as I had hinted in the my message, I observed the bug in an irregular
manner.

Glad to see it could be fixed so quickly (it affects 2.0 too). I had observed it
for some time, but only recently took the time to make a proper simplified case
and investigate. Guess I should have submitted the issue sooner...

Best regards,

        Yvan Fournier


> Message: 5
> Date: Sat, 5 Nov 2016 22:08:32 +0900
> From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> To: Open MPI Users <users@lists.open-mpi.org>
> Subject: Re: [OMPI users] False positives and even failure with Open
>       MPI and memchecker
> Message-ID:
>       <CAAkFZ5uQhR0m-7GWjmp01DuNpZe1wCAOY19cMb4=rs5zc6s...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> that really looks like a bug
> 
> if you rewrite your program with
> 
>   MPI_Sendrecv(&l, 1, MPI_INT, rank_next, tag, &l_prev, 1, MPI_INT,
> rank_prev, tag, MPI_COMM_WORLD, &status);
> 
> or even
> 
>   MPI_Irecv(&l_prev, 1, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &req);
> 
>   MPI_Send(&l, 1, MPI_INT, rank_next, tag, MPI_COMM_WORLD);
> 
>   MPI_Wait(&req, &status);
> 
> then there is no more valgrind warning
> 
> iirc, Open MPI marks the receive buffer as invalid memory, so it can
> check only MPI subroutine updates it. it looks like a step is missing
> in the case of MPI_Recv()
> 
> 
> Cheers,
> 
> Gilles
> 
> On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet
> <gilles.gouaillar...@gmail.com> wrote:
> > Hi,
> > 
> > note your printf line is missing.
> > if you printf l_prev, then the valgrind error occurs in all variants
> > 
> > at first glance, it looks like a false positive, and i will investigate it
> > 
> > 
> > Cheers,
> > 
> > Gilles
> > 
> > On Sat, Nov 5, 2016 at 7:59 PM, Yvan Fournier <yvan.fourn...@free.fr> wrote:
> > > Hello,
> > > 
> > > I have observed what seems to be false positives running under Valgrind
> > > when Open MPI is build with --enable-memchecker
> > > (at least with versions 1.10.4 and 2.0.1).
> > > 
> > > Attached is a simple test case (extracted from larger code) that sends one
> > > int to rank r+1, and receives from rank r-1
> > > (using MPI_COMM_NULL to handle ranks below 0 or above comm size).
> > > 
> > > Using:
> > > 
> > > ~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c
> > > ~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind
> > > ./a.out
> > > 
> > > I get the following Valgrind error for rank 1:
> > > 
> > > ==8382== Invalid read of size 4
> > > ==8382==    at 0x400A00: main (in /home/yvan/test/a.out)
> > > ==8382==  Address 0xffefffe70 is on thread 1's stack
> > > ==8382==  in frame #0, created by main (???:)
> > > 
> > > 
> > > Using:
> > > 
> > > ~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c
> > > ~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind
> > > ./a.out
> > > 
> > > I get the following Valgrind error for rank 1:
> > > 
> > > ==8322== Invalid read of size 4
> > > ==8322==    at 0x400A6C: main (in /home/yvan/test/a.out)
> > > ==8322==  Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd
> > > ==8322==    at 0x4C29BBE: malloc (in /usr/lib/valgrind/vgpreload_memcheck-
> > > amd64-linux.so)
> > > ==8322==    by 0x400998: main (in /home/yvan/test/a.out)
> > > 
> > > I get no error for the default variant (no -D_VARIANT...) with either Open
> > > MPI 2.0.1, or 1.10.4,
> > > but de get an error similar to variant 1 on the parent code from which the
> > > example was extracted...
> > > 
> > > is given below. Running under Valgrind's gdb server, for the parent code
> > > of variant 1,
> > > it even seems the value received on rank 1 is uninitialized, then Valgrind
> > > complains
> > > with the given message.
> > > 
> > > The code fails to work as intended when run under Valgrind when OpenMPI is
> > > built with --enable-memchecker,
> > > while it works fine when run with the same build but not under Valgrind,
> > > or when run under Valgrind with Open MPI built without memchecker.
> > > 
> > > I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built
> > > with memchecker enabled,
> > > rendering it unusable under Valgrind).
> > > 
> > > Did anybody else encounter this type of issue, or I does my code contain
> > > an obvious mistake that I am missing ?
> > > I initially though of possible alignment issues, but saw nothing in the
> > > standard that requires that,
> > > and the "malloc"-base variant exhibits the same behavior,while I assume
> > > alignment to 64-bits for allocated arrays is the default.
> > > 
> > > Best regards,
> > > 
> > >   Yvan Fournier
> > > _______________________________________________
> > > users mailing list
> > > users@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> 
> ------------------------------
> 
> Message: 6
> Date: Sat, 5 Nov 2016 23:12:54 +0900
> From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> To: Open MPI Users <users@lists.open-mpi.org>
> Subject: Re: [OMPI users] False positives and even failure with Open
>       MPI and memchecker
> Message-ID:
>       <caakfz5sbeocjo0id6hknghzj-jexa1f4a17ghlevwveka1s...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> so it seems we took some shortcuts in pml/ob1
> 
> the attached patch (for the v1.10 branch) should fix this issue
> 
> 
> Cheers
> 
> Gilles
> 
> 
> 
> On Sat, Nov 5, 2016 at 10:08 PM, Gilles Gouaillardet
> <gilles.gouaillar...@gmail.com> wrote:
> > that really looks like a bug
> > 
> > if you rewrite your program with
> > 
> >   MPI_Sendrecv(&l, 1, MPI_INT, rank_next, tag, &l_prev, 1, MPI_INT,
> > rank_prev, tag, MPI_COMM_WORLD, &status);
> > 
> > or even
> > 
> >   MPI_Irecv(&l_prev, 1, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &req);
> > 
> >   MPI_Send(&l, 1, MPI_INT, rank_next, tag, MPI_COMM_WORLD);
> > 
> >   MPI_Wait(&req, &status);
> > 
> > then there is no more valgrind warning
> > 
> > iirc, Open MPI marks the receive buffer as invalid memory, so it can
> > check only MPI subroutine updates it. it looks like a step is missing
> > in the case of MPI_Recv()
> > 
> > 
> > Cheers,
> > 
> > Gilles
> > 
> > On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet
> > <gilles.gouaillar...@gmail.com> wrote:
> > > Hi,
> > > 
> > > note your printf line is missing.
> > > if you printf l_prev, then the valgrind error occurs in all variants
> > > 
> > > at first glance, it looks like a false positive, and i will investigate it
> > > 
> > > 
> > > Cheers,
> > > 
> > > Gilles
> > > 
> > > On Sat, Nov 5, 2016 at 7:59 PM, Yvan Fournier <yvan.fourn...@free.fr>
> > > wrote:
> > > > Hello,
> > > > 
> > > > I have observed what seems to be false positives running under Valgrind
> > > > when Open MPI is build with --enable-memchecker
> > > > (at least with versions 1.10.4 and 2.0.1).
> > > > 
> > > > Attached is a simple test case (extracted from larger code) that sends
> > > > one int to rank r+1, and receives from rank r-1
> > > > (using MPI_COMM_NULL to handle ranks below 0 or above comm size).
> > > > 
> > > > Using:
> > > > 
> > > > ~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c
> > > > ~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind
> > > > ./a.out
> > > > 
> > > > I get the following Valgrind error for rank 1:
> > > > 
> > > > ==8382== Invalid read of size 4
> > > > ==8382==    at 0x400A00: main (in /home/yvan/test/a.out)
> > > > ==8382==  Address 0xffefffe70 is on thread 1's stack
> > > > ==8382==  in frame #0, created by main (???:)
> > > > 
> > > > 
> > > > Using:
> > > > 
> > > > ~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c
> > > > ~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind
> > > > ./a.out
> > > > 
> > > > I get the following Valgrind error for rank 1:
> > > > 
> > > > ==8322== Invalid read of size 4
> > > > ==8322==    at 0x400A6C: main (in /home/yvan/test/a.out)
> > > > ==8322==  Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd
> > > > ==8322==    at 0x4C29BBE: malloc (in
> > > > /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> > > > ==8322==    by 0x400998: main (in /home/yvan/test/a.out)
> > > > 
> > > > I get no error for the default variant (no -D_VARIANT...) with either
> > > > Open MPI 2.0.1, or 1.10.4,
> > > > but de get an error similar to variant 1 on the parent code from which
> > > > the example was extracted...
> > > > 
> > > > is given below. Running under Valgrind's gdb server, for the parent code
> > > > of variant 1,
> > > > it even seems the value received on rank 1 is uninitialized, then
> > > > Valgrind complains
> > > > with the given message.
> > > > 
> > > > The code fails to work as intended when run under Valgrind when OpenMPI
> > > > is built with --enable-memchecker,
> > > > while it works fine when run with the same build but not under Valgrind,
> > > > or when run under Valgrind with Open MPI built without memchecker.
> > > > 
> > > > I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built
> > > > with memchecker enabled,
> > > > rendering it unusable under Valgrind).
> > > > 
> > > > Did anybody else encounter this type of issue, or I does my code contain
> > > > an obvious mistake that I am missing ?
> > > > I initially though of possible alignment issues, but saw nothing in the
> > > > standard that requires that,
> > > > and the "malloc"-base variant exhibits the same behavior,while I assume
> > > > alignment to 64-bits for allocated arrays is the default.
> > > > 
> > > > Best regards,
> > > > 
> > > >   Yvan Fournier
> > > > _______________________________________________
> > > > users mailing list
> > > > users@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> -------------- next part --------------
> diff --git a/ompi/mca/pml/ob1/pml_ob1_irecv.c
> b/ompi/mca/pml/ob1/pml_ob1_irecv.c
> index 56826a2..97a6a38 100644
> --- a/ompi/mca/pml/ob1/pml_ob1_irecv.c
> +++ b/ompi/mca/pml/ob1/pml_ob1_irecv.c
> @@ -30,6 +30,7 @@
>  #include "pml_ob1_recvfrag.h"
>  #include "ompi/peruse/peruse-internal.h"
>  #include "ompi/message/message.h"
> +#include "ompi/memchecker.h"
>  
>  mca_pml_ob1_recv_request_t *mca_pml_ob1_recvreq = NULL;
>  
> @@ -128,6 +129,17 @@ int mca_pml_ob1_recv(void *addr,
>  
>      rc = recvreq->req_recv.req_base.req_ompi.req_status.MPI_ERROR;
>  
> +    if (recvreq->req_recv.req_base.req_pml_complete) {
> +        /* make buffer defined when the request is compeleted,
> +           and before releasing the objects. */
> +        MEMCHECKER(
> +            memchecker_call(&opal_memchecker_base_mem_defined,
> +                            recvreq->req_recv.req_base.req_addr,
> +                            recvreq->req_recv.req_base.req_count,
> +                            recvreq->req_recv.req_base.req_datatype);
> +        );
> +    }
> +
>  #if OMPI_ENABLE_THREAD_MULTIPLE
>      MCA_PML_OB1_RECV_REQUEST_RETURN(recvreq);
>  #else
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ------------------------------
> 
> End of users Digest, Vol 3645, Issue 1
> **************************************
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to