so it seems we took some shortcuts in pml/ob1 the attached patch (for the v1.10 branch) should fix this issue
Cheers Gilles On Sat, Nov 5, 2016 at 10:08 PM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote: > that really looks like a bug > > if you rewrite your program with > > MPI_Sendrecv(&l, 1, MPI_INT, rank_next, tag, &l_prev, 1, MPI_INT, > rank_prev, tag, MPI_COMM_WORLD, &status); > > or even > > MPI_Irecv(&l_prev, 1, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &req); > > MPI_Send(&l, 1, MPI_INT, rank_next, tag, MPI_COMM_WORLD); > > MPI_Wait(&req, &status); > > then there is no more valgrind warning > > iirc, Open MPI marks the receive buffer as invalid memory, so it can > check only MPI subroutine updates it. it looks like a step is missing > in the case of MPI_Recv() > > > Cheers, > > Gilles > > On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: >> Hi, >> >> note your printf line is missing. >> if you printf l_prev, then the valgrind error occurs in all variants >> >> at first glance, it looks like a false positive, and i will investigate it >> >> >> Cheers, >> >> Gilles >> >> On Sat, Nov 5, 2016 at 7:59 PM, Yvan Fournier <yvan.fourn...@free.fr> wrote: >>> Hello, >>> >>> I have observed what seems to be false positives running under Valgrind >>> when Open MPI is build with --enable-memchecker >>> (at least with versions 1.10.4 and 2.0.1). >>> >>> Attached is a simple test case (extracted from larger code) that sends one >>> int to rank r+1, and receives from rank r-1 >>> (using MPI_COMM_NULL to handle ranks below 0 or above comm size). >>> >>> Using: >>> >>> ~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c >>> ~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind ./a.out >>> >>> I get the following Valgrind error for rank 1: >>> >>> ==8382== Invalid read of size 4 >>> ==8382== at 0x400A00: main (in /home/yvan/test/a.out) >>> ==8382== Address 0xffefffe70 is on thread 1's stack >>> ==8382== in frame #0, created by main (???:) >>> >>> >>> Using: >>> >>> ~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c >>> ~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind ./a.out >>> >>> I get the following Valgrind error for rank 1: >>> >>> ==8322== Invalid read of size 4 >>> ==8322== at 0x400A6C: main (in /home/yvan/test/a.out) >>> ==8322== Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd >>> ==8322== at 0x4C29BBE: malloc (in >>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==8322== by 0x400998: main (in /home/yvan/test/a.out) >>> >>> I get no error for the default variant (no -D_VARIANT...) with either Open >>> MPI 2.0.1, or 1.10.4, >>> but de get an error similar to variant 1 on the parent code from which the >>> example was extracted... >>> >>> is given below. Running under Valgrind's gdb server, for the parent code of >>> variant 1, >>> it even seems the value received on rank 1 is uninitialized, then Valgrind >>> complains >>> with the given message. >>> >>> The code fails to work as intended when run under Valgrind when OpenMPI is >>> built with --enable-memchecker, >>> while it works fine when run with the same build but not under Valgrind, >>> or when run under Valgrind with Open MPI built without memchecker. >>> >>> I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built with >>> memchecker enabled, >>> rendering it unusable under Valgrind). >>> >>> Did anybody else encounter this type of issue, or I does my code contain an >>> obvious mistake that I am missing ? >>> I initially though of possible alignment issues, but saw nothing in the >>> standard that requires that, >>> and the "malloc"-base variant exhibits the same behavior,while I assume >>> alignment to 64-bits for allocated arrays is the default. >>> >>> Best regards, >>> >>> Yvan Fournier >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
diff --git a/ompi/mca/pml/ob1/pml_ob1_irecv.c b/ompi/mca/pml/ob1/pml_ob1_irecv.c index 56826a2..97a6a38 100644 --- a/ompi/mca/pml/ob1/pml_ob1_irecv.c +++ b/ompi/mca/pml/ob1/pml_ob1_irecv.c @@ -30,6 +30,7 @@ #include "pml_ob1_recvfrag.h" #include "ompi/peruse/peruse-internal.h" #include "ompi/message/message.h" +#include "ompi/memchecker.h" mca_pml_ob1_recv_request_t *mca_pml_ob1_recvreq = NULL; @@ -128,6 +129,17 @@ int mca_pml_ob1_recv(void *addr, rc = recvreq->req_recv.req_base.req_ompi.req_status.MPI_ERROR; + if (recvreq->req_recv.req_base.req_pml_complete) { + /* make buffer defined when the request is compeleted, + and before releasing the objects. */ + MEMCHECKER( + memchecker_call(&opal_memchecker_base_mem_defined, + recvreq->req_recv.req_base.req_addr, + recvreq->req_recv.req_base.req_count, + recvreq->req_recv.req_base.req_datatype); + ); + } + #if OMPI_ENABLE_THREAD_MULTIPLE MCA_PML_OB1_RECV_REQUEST_RETURN(recvreq); #else
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users