Hello, Yes, as I had hinted in the my message, I observed the bug in an irregular manner.
Glad to see it could be fixed so quickly (it affects 2.0 too). I had observed it for some time, but only recently took the time to make a proper simplified case and investigate. Guess I should have submitted the issue sooner... Best regards, Yvan Fournier > Message: 5 > Date: Sat, 5 Nov 2016 22:08:32 +0900 > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > To: Open MPI Users <users@lists.open-mpi.org> > Subject: Re: [OMPI users] False positives and even failure with Open > MPI and memchecker > Message-ID: > <CAAkFZ5uQhR0m-7GWjmp01DuNpZe1wCAOY19cMb4=rs5zc6s...@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > that really looks like a bug > > if you rewrite your program with > > MPI_Sendrecv(&l, 1, MPI_INT, rank_next, tag, &l_prev, 1, MPI_INT, > rank_prev, tag, MPI_COMM_WORLD, &status); > > or even > > MPI_Irecv(&l_prev, 1, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &req); > > MPI_Send(&l, 1, MPI_INT, rank_next, tag, MPI_COMM_WORLD); > > MPI_Wait(&req, &status); > > then there is no more valgrind warning > > iirc, Open MPI marks the receive buffer as invalid memory, so it can > check only MPI subroutine updates it. it looks like a step is missing > in the case of MPI_Recv() > > > Cheers, > > Gilles > > On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > > Hi, > > > > note your printf line is missing. > > if you printf l_prev, then the valgrind error occurs in all variants > > > > at first glance, it looks like a false positive, and i will investigate it > > > > > > Cheers, > > > > Gilles > > > > On Sat, Nov 5, 2016 at 7:59 PM, Yvan Fournier <yvan.fourn...@free.fr> wrote: > > > Hello, > > > > > > I have observed what seems to be false positives running under Valgrind > > > when Open MPI is build with --enable-memchecker > > > (at least with versions 1.10.4 and 2.0.1). > > > > > > Attached is a simple test case (extracted from larger code) that sends one > > > int to rank r+1, and receives from rank r-1 > > > (using MPI_COMM_NULL to handle ranks below 0 or above comm size). > > > > > > Using: > > > > > > ~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c > > > ~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind > > > ./a.out > > > > > > I get the following Valgrind error for rank 1: > > > > > > ==8382== Invalid read of size 4 > > > ==8382== at 0x400A00: main (in /home/yvan/test/a.out) > > > ==8382== Address 0xffefffe70 is on thread 1's stack > > > ==8382== in frame #0, created by main (???:) > > > > > > > > > Using: > > > > > > ~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c > > > ~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind > > > ./a.out > > > > > > I get the following Valgrind error for rank 1: > > > > > > ==8322== Invalid read of size 4 > > > ==8322== at 0x400A6C: main (in /home/yvan/test/a.out) > > > ==8322== Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd > > > ==8322== at 0x4C29BBE: malloc (in /usr/lib/valgrind/vgpreload_memcheck- > > > amd64-linux.so) > > > ==8322== by 0x400998: main (in /home/yvan/test/a.out) > > > > > > I get no error for the default variant (no -D_VARIANT...) with either Open > > > MPI 2.0.1, or 1.10.4, > > > but de get an error similar to variant 1 on the parent code from which the > > > example was extracted... > > > > > > is given below. Running under Valgrind's gdb server, for the parent code > > > of variant 1, > > > it even seems the value received on rank 1 is uninitialized, then Valgrind > > > complains > > > with the given message. > > > > > > The code fails to work as intended when run under Valgrind when OpenMPI is > > > built with --enable-memchecker, > > > while it works fine when run with the same build but not under Valgrind, > > > or when run under Valgrind with Open MPI built without memchecker. > > > > > > I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built > > > with memchecker enabled, > > > rendering it unusable under Valgrind). > > > > > > Did anybody else encounter this type of issue, or I does my code contain > > > an obvious mistake that I am missing ? > > > I initially though of possible alignment issues, but saw nothing in the > > > standard that requires that, > > > and the "malloc"-base variant exhibits the same behavior,while I assume > > > alignment to 64-bits for allocated arrays is the default. > > > > > > Best regards, > > > > > > Yvan Fournier > > > _______________________________________________ > > > users mailing list > > > users@lists.open-mpi.org > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > ------------------------------ > > Message: 6 > Date: Sat, 5 Nov 2016 23:12:54 +0900 > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > To: Open MPI Users <users@lists.open-mpi.org> > Subject: Re: [OMPI users] False positives and even failure with Open > MPI and memchecker > Message-ID: > <caakfz5sbeocjo0id6hknghzj-jexa1f4a17ghlevwveka1s...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > so it seems we took some shortcuts in pml/ob1 > > the attached patch (for the v1.10 branch) should fix this issue > > > Cheers > > Gilles > > > > On Sat, Nov 5, 2016 at 10:08 PM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > > that really looks like a bug > > > > if you rewrite your program with > > > > MPI_Sendrecv(&l, 1, MPI_INT, rank_next, tag, &l_prev, 1, MPI_INT, > > rank_prev, tag, MPI_COMM_WORLD, &status); > > > > or even > > > > MPI_Irecv(&l_prev, 1, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &req); > > > > MPI_Send(&l, 1, MPI_INT, rank_next, tag, MPI_COMM_WORLD); > > > > MPI_Wait(&req, &status); > > > > then there is no more valgrind warning > > > > iirc, Open MPI marks the receive buffer as invalid memory, so it can > > check only MPI subroutine updates it. it looks like a step is missing > > in the case of MPI_Recv() > > > > > > Cheers, > > > > Gilles > > > > On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet > > <gilles.gouaillar...@gmail.com> wrote: > > > Hi, > > > > > > note your printf line is missing. > > > if you printf l_prev, then the valgrind error occurs in all variants > > > > > > at first glance, it looks like a false positive, and i will investigate it > > > > > > > > > Cheers, > > > > > > Gilles > > > > > > On Sat, Nov 5, 2016 at 7:59 PM, Yvan Fournier <yvan.fourn...@free.fr> > > > wrote: > > > > Hello, > > > > > > > > I have observed what seems to be false positives running under Valgrind > > > > when Open MPI is build with --enable-memchecker > > > > (at least with versions 1.10.4 and 2.0.1). > > > > > > > > Attached is a simple test case (extracted from larger code) that sends > > > > one int to rank r+1, and receives from rank r-1 > > > > (using MPI_COMM_NULL to handle ranks below 0 or above comm size). > > > > > > > > Using: > > > > > > > > ~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c > > > > ~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind > > > > ./a.out > > > > > > > > I get the following Valgrind error for rank 1: > > > > > > > > ==8382== Invalid read of size 4 > > > > ==8382== at 0x400A00: main (in /home/yvan/test/a.out) > > > > ==8382== Address 0xffefffe70 is on thread 1's stack > > > > ==8382== in frame #0, created by main (???:) > > > > > > > > > > > > Using: > > > > > > > > ~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c > > > > ~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind > > > > ./a.out > > > > > > > > I get the following Valgrind error for rank 1: > > > > > > > > ==8322== Invalid read of size 4 > > > > ==8322== at 0x400A6C: main (in /home/yvan/test/a.out) > > > > ==8322== Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd > > > > ==8322== at 0x4C29BBE: malloc (in > > > > /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) > > > > ==8322== by 0x400998: main (in /home/yvan/test/a.out) > > > > > > > > I get no error for the default variant (no -D_VARIANT...) with either > > > > Open MPI 2.0.1, or 1.10.4, > > > > but de get an error similar to variant 1 on the parent code from which > > > > the example was extracted... > > > > > > > > is given below. Running under Valgrind's gdb server, for the parent code > > > > of variant 1, > > > > it even seems the value received on rank 1 is uninitialized, then > > > > Valgrind complains > > > > with the given message. > > > > > > > > The code fails to work as intended when run under Valgrind when OpenMPI > > > > is built with --enable-memchecker, > > > > while it works fine when run with the same build but not under Valgrind, > > > > or when run under Valgrind with Open MPI built without memchecker. > > > > > > > > I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built > > > > with memchecker enabled, > > > > rendering it unusable under Valgrind). > > > > > > > > Did anybody else encounter this type of issue, or I does my code contain > > > > an obvious mistake that I am missing ? > > > > I initially though of possible alignment issues, but saw nothing in the > > > > standard that requires that, > > > > and the "malloc"-base variant exhibits the same behavior,while I assume > > > > alignment to 64-bits for allocated arrays is the default. > > > > > > > > Best regards, > > > > > > > > Yvan Fournier > > > > _______________________________________________ > > > > users mailing list > > > > users@lists.open-mpi.org > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > -------------- next part -------------- > diff --git a/ompi/mca/pml/ob1/pml_ob1_irecv.c > b/ompi/mca/pml/ob1/pml_ob1_irecv.c > index 56826a2..97a6a38 100644 > --- a/ompi/mca/pml/ob1/pml_ob1_irecv.c > +++ b/ompi/mca/pml/ob1/pml_ob1_irecv.c > @@ -30,6 +30,7 @@ > #include "pml_ob1_recvfrag.h" > #include "ompi/peruse/peruse-internal.h" > #include "ompi/message/message.h" > +#include "ompi/memchecker.h" > > mca_pml_ob1_recv_request_t *mca_pml_ob1_recvreq = NULL; > > @@ -128,6 +129,17 @@ int mca_pml_ob1_recv(void *addr, > > rc = recvreq->req_recv.req_base.req_ompi.req_status.MPI_ERROR; > > + if (recvreq->req_recv.req_base.req_pml_complete) { > + /* make buffer defined when the request is compeleted, > + and before releasing the objects. */ > + MEMCHECKER( > + memchecker_call(&opal_memchecker_base_mem_defined, > + recvreq->req_recv.req_base.req_addr, > + recvreq->req_recv.req_base.req_count, > + recvreq->req_recv.req_base.req_datatype); > + ); > + } > + > #if OMPI_ENABLE_THREAD_MULTIPLE > MCA_PML_OB1_RECV_REQUEST_RETURN(recvreq); > #else > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > ------------------------------ > > End of users Digest, Vol 3645, Issue 1 > ************************************** _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users