so it seems we took some shortcuts in pml/ob1

the attached patch (for the v1.10 branch) should fix this issue


Cheers

Gilles



On Sat, Nov 5, 2016 at 10:08 PM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com> wrote:
> that really looks like a bug
>
> if you rewrite your program with
>
>   MPI_Sendrecv(&l, 1, MPI_INT, rank_next, tag, &l_prev, 1, MPI_INT,
> rank_prev, tag, MPI_COMM_WORLD, &status);
>
> or even
>
>   MPI_Irecv(&l_prev, 1, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &req);
>
>   MPI_Send(&l, 1, MPI_INT, rank_next, tag, MPI_COMM_WORLD);
>
>   MPI_Wait(&req, &status);
>
> then there is no more valgrind warning
>
> iirc, Open MPI marks the receive buffer as invalid memory, so it can
> check only MPI subroutine updates it. it looks like a step is missing
> in the case of MPI_Recv()
>
>
> Cheers,
>
> Gilles
>
> On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet
> <gilles.gouaillar...@gmail.com> wrote:
>> Hi,
>>
>> note your printf line is missing.
>> if you printf l_prev, then the valgrind error occurs in all variants
>>
>> at first glance, it looks like a false positive, and i will investigate it
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On Sat, Nov 5, 2016 at 7:59 PM, Yvan Fournier <yvan.fourn...@free.fr> wrote:
>>> Hello,
>>>
>>> I have observed what seems to be false positives running under Valgrind 
>>> when Open MPI is build with --enable-memchecker
>>> (at least with versions 1.10.4 and 2.0.1).
>>>
>>> Attached is a simple test case (extracted from larger code) that sends one 
>>> int to rank r+1, and receives from rank r-1
>>> (using MPI_COMM_NULL to handle ranks below 0 or above comm size).
>>>
>>> Using:
>>>
>>> ~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c
>>> ~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind ./a.out
>>>
>>> I get the following Valgrind error for rank 1:
>>>
>>> ==8382== Invalid read of size 4
>>> ==8382==    at 0x400A00: main (in /home/yvan/test/a.out)
>>> ==8382==  Address 0xffefffe70 is on thread 1's stack
>>> ==8382==  in frame #0, created by main (???:)
>>>
>>>
>>> Using:
>>>
>>> ~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c
>>> ~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind ./a.out
>>>
>>> I get the following Valgrind error for rank 1:
>>>
>>> ==8322== Invalid read of size 4
>>> ==8322==    at 0x400A6C: main (in /home/yvan/test/a.out)
>>> ==8322==  Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd
>>> ==8322==    at 0x4C29BBE: malloc (in 
>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>> ==8322==    by 0x400998: main (in /home/yvan/test/a.out)
>>>
>>> I get no error for the default variant (no -D_VARIANT...) with either Open 
>>> MPI 2.0.1, or 1.10.4,
>>> but de get an error similar to variant 1 on the parent code from which the 
>>> example was extracted...
>>>
>>> is given below. Running under Valgrind's gdb server, for the parent code of 
>>> variant 1,
>>> it even seems the value received on rank 1 is uninitialized, then Valgrind 
>>> complains
>>> with the given message.
>>>
>>> The code fails to work as intended when run under Valgrind when OpenMPI is 
>>> built with --enable-memchecker,
>>> while it works fine when run with the same build but not under Valgrind,
>>> or when run under Valgrind with Open MPI built without memchecker.
>>>
>>> I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built with 
>>> memchecker enabled,
>>> rendering it unusable under Valgrind).
>>>
>>> Did anybody else encounter this type of issue, or I does my code contain an 
>>> obvious mistake that I am missing ?
>>> I initially though of possible alignment issues, but saw nothing in the 
>>> standard that requires that,
>>> and the "malloc"-base variant exhibits the same behavior,while I assume
>>> alignment to 64-bits for allocated arrays is the default.
>>>
>>> Best regards,
>>>
>>>   Yvan Fournier
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
diff --git a/ompi/mca/pml/ob1/pml_ob1_irecv.c b/ompi/mca/pml/ob1/pml_ob1_irecv.c
index 56826a2..97a6a38 100644
--- a/ompi/mca/pml/ob1/pml_ob1_irecv.c
+++ b/ompi/mca/pml/ob1/pml_ob1_irecv.c
@@ -30,6 +30,7 @@
 #include "pml_ob1_recvfrag.h"
 #include "ompi/peruse/peruse-internal.h"
 #include "ompi/message/message.h"
+#include "ompi/memchecker.h"
 
 mca_pml_ob1_recv_request_t *mca_pml_ob1_recvreq = NULL;
 
@@ -128,6 +129,17 @@ int mca_pml_ob1_recv(void *addr,
 
     rc = recvreq->req_recv.req_base.req_ompi.req_status.MPI_ERROR;
 
+    if (recvreq->req_recv.req_base.req_pml_complete) {
+        /* make buffer defined when the request is compeleted,
+           and before releasing the objects. */
+        MEMCHECKER(
+            memchecker_call(&opal_memchecker_base_mem_defined,
+                            recvreq->req_recv.req_base.req_addr,
+                            recvreq->req_recv.req_base.req_count,
+                            recvreq->req_recv.req_base.req_datatype);
+        );
+    }
+
 #if OMPI_ENABLE_THREAD_MULTIPLE
     MCA_PML_OB1_RECV_REQUEST_RETURN(recvreq);
 #else
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to