We deliberately choose to not initialize our msg buffers as this takes
considerable time. Instead, we fill in only the portion required by a given
message, and then send only that much of the buffer. Thus, the uninitialized
portion is ignored.

I don't know of a way to tell valgrind to ignore it, I'm afraid - perhaps a
valgrind guru can be of help. :-/

Ralph


On Mon, Jun 8, 2009 at 1:09 PM, tom fogal <tfo...@alumni.unh.edu> wrote:

> Hi all,
>
> I've configured a source build of OpenMPI 1.3.2 with valgrind enabled
> [1], and I'm seeing a lot of errors with writev() when I run this under
> valgrind.  For example, with the following `hello, world' program:
>
>  #include <stdio.h>
>  #include <mpi.h>
>
>  int main(int argc, char *argv[]) {
>    MPI_Init(&argc, &argv);
>
>    puts("Hello, world!");
>    MPI_Finalize();
>    return 0;
>  }
>
> I see errors like the following:
>
>  ==12342== Syscall param writev(vector[...]) points to uninitialised
> byte(s)
>  ==12342==    at 0x61DF733: writev (in /lib/libc-2.7.so)
>  ==12342==    by 0x7889AB9: mca_oob_tcp_msg_send_handler
> (oob_tcp_msg.c:265)
>  ==12342==    by 0x788B1A0: mca_oob_tcp_peer_send (oob_tcp_peer.c:197)
>  ==12342==    by 0x788FF2A: mca_oob_tcp_send_nb (oob_tcp_send.c:167)
>  ==12342==    by 0x767C7EC: orte_rml_oob_send (rml_oob_send.c:137)
>  ==12342==    by 0x767D19A: orte_rml_oob_send_buffer (rml_oob_send.c:269)
>  ==12342==    by 0x7C9F3DF: allgather (grpcomm_bad_module.c:369)
>  ==12342==    by 0x7C9FD9E: modex (grpcomm_bad_module.c:497)
>  ==12342==    by 0x4E6DCAF: ompi_mpi_init (ompi_mpi_init.c:626)
>
> The full vg log is appended [2].  Of course, I could just suppress
> this error, but I get this for a lot (every?) MPI call which does
> communication, it seems (broadcasts, sends, recv's, allgathers, etc.).
> I'm worried a suppression would suppress too much / suppress an error
> I've caused.
>
> Have others seen this?  Can I suppress perhaps from the
> orte_rml_oob_send_buffer down (safely)?
>
> -tom
>
> [1] configured via: gnu_pkg \
>    --enable-debug \
>    --enable-memchecker \
>    --disable-mpi-f77 \
>    --enable-pretty-print-stacktrace \
>    --enable-cxx-exceptions \
>    --enable-mpi-threads \
>    --with-valgrind=${PREFIX} \
>    --without-gm \
>    --without-mx \
>    --without-openib \
>    --without-psm \
>    --with-pic \
>    --with-gnu-ld
>  where gnu_pkg is basically a function which calls configure with
>  --prefix=${PREFIX}.
>
> [2]
> ==12342== Memcheck, a memory error detector.
> ==12342== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al.
> ==12342== Using LibVEX rev 1884, a library for dynamic binary translation.
> ==12342== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP.
> ==12342== Using valgrind-3.4.1, a dynamic binary instrumentation framework.
> ==12342== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al.
> ==12342== For more details, rerun with: -v
> ==12342==
> ==12342== My PID = 12342, parent PID = 12341.  Prog and args are:
> ==12342==    ./a.out
> ==12342==
> ==12342== Warning: client syscall munmap tried to modify addresses
> 0xffffffffffffffff-0xffe
> ==12342== Syscall param writev(vector[...]) points to uninitialised byte(s)
> ==12342==    at 0x61DF733: writev (in /lib/libc-2.7.so)
> ==12342==    by 0x7889AB9: mca_oob_tcp_msg_send_handler (oob_tcp_msg.c:265)
> ==12342==    by 0x788B1A0: mca_oob_tcp_peer_send (oob_tcp_peer.c:197)
> ==12342==    by 0x788FF2A: mca_oob_tcp_send_nb (oob_tcp_send.c:167)
> ==12342==    by 0x767C7EC: orte_rml_oob_send (rml_oob_send.c:137)
> ==12342==    by 0x767D19A: orte_rml_oob_send_buffer (rml_oob_send.c:269)
> ==12342==    by 0x7C9F3DF: allgather (grpcomm_bad_module.c:369)
> ==12342==    by 0x7C9FD9E: modex (grpcomm_bad_module.c:497)
> ==12342==    by 0x4E6DCAF: ompi_mpi_init (ompi_mpi_init.c:626)
> ==12342==    by 0x4EAAC88: PMPI_Init (pinit.c:80)
> ==12342==    by 0x400857: main (hello.c:5)
> ==12342==  Address 0x677697b is 107 bytes inside a block of size 256
> alloc'd
> ==12342==    at 0x4C22A51: realloc (vg_replace_malloc.c:429)
> ==12342==    by 0x53DCBE0: opal_dss_buffer_extend
> (dss_internal_functions.c:63)
> ==12342==    by 0x53DE4BA: opal_dss_copy_payload (dss_load_unload.c:164)
> ==12342==    by 0x7C9F314: allgather (grpcomm_bad_module.c:363)
> ==12342==    by 0x7C9FD9E: modex (grpcomm_bad_module.c:497)
> ==12342==    by 0x4E6DCAF: ompi_mpi_init (ompi_mpi_init.c:626)
> ==12342==    by 0x4EAAC88: PMPI_Init (pinit.c:80)
> ==12342==    by 0x400857: main (hello.c:5)
> ==12342==  Uninitialised value was created by a stack allocation
> ==12342==    at 0x53FFA60: opal_ifinit (if.c:147)
> {
>   <insert a suppression name here>
>   Memcheck:Param
>   writev(vector[...])
>   fun:writev
>   fun:mca_oob_tcp_msg_send_handler
>   fun:mca_oob_tcp_peer_send
>   fun:mca_oob_tcp_send_nb
>   fun:orte_rml_oob_send
>   fun:orte_rml_oob_send_buffer
>   fun:allgather
>   fun:modex
>   fun:ompi_mpi_init
>   fun:PMPI_Init
>   fun:main
> }
> ==12342==
> ==12342== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 307 from 3)
> ==12342== malloc/free: in use at exit: 204,012 bytes in 2,022 blocks.
> ==12342== malloc/free: 10,382 allocs, 8,360 frees, 14,603,162 bytes
> allocated.
> ==12342== For a detailed leak analysis,  rerun with: --leak-check=yes
> ==12342== For counts of detected errors, rerun with: -v
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to