George Bosilca <bosi...@eecs.utk.edu> writes: > There is a whole page on valgrind web page about this topic. Please > read http://valgrind.org/docs/manual/manual-core.html#manual-core.suppress > for more information.
Even better, Ralph (et al.) is if we could just make valgrind think this is defined memory. One can do this with client requests: http://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs in particular, the VALGRIND_MAKE_MEM_DEFINED. This would prevent vg from warning about it, without having to memset the whole buffer or similar. Is requesting that be done here enough? Or shall I open a ticket? Thanks, -tom > On Jun 8, 2009, at 15:24 , Ralph Castain wrote: > > > We deliberately choose to not initialize our msg buffers as this > > takes considerable time. Instead, we fill in only the portion > > required by a given message, and then send only that much of the > > buffer. Thus, the uninitialized portion is ignored. > > > > I don't know of a way to tell valgrind to ignore it, I'm afraid - > > perhaps a valgrind guru can be of help. :-/ > > > > Ralph > > > > > > On Mon, Jun 8, 2009 at 1:09 PM, tom fogal <tfo...@alumni.unh.edu> > > wrote: > > Hi all, > > > > I've configured a source build of OpenMPI 1.3.2 with valgrind enabled > > [1], and I'm seeing a lot of errors with writev() when I run this > > under > > valgrind. For example, with the following `hello, world' program: > > > > #include <stdio.h> > > #include <mpi.h> > > > > int main(int argc, char *argv[]) { > > MPI_Init(&argc, &argv); > > > > puts("Hello, world!"); > > MPI_Finalize(); > > return 0; > > } > > > > I see errors like the following: > > > > ==12342== Syscall param writev(vector[...]) points to uninitialised > > byte(s) > > ==12342== at 0x61DF733: writev (in /lib/libc-2.7.so) > > ==12342== by 0x7889AB9: mca_oob_tcp_msg_send_handler > > (oob_tcp_msg.c:265) > > ==12342== by 0x788B1A0: mca_oob_tcp_peer_send (oob_tcp_peer.c:197) > > ==12342== by 0x788FF2A: mca_oob_tcp_send_nb (oob_tcp_send.c:167) > > ==12342== by 0x767C7EC: orte_rml_oob_send (rml_oob_send.c:137) > > ==12342== by 0x767D19A: orte_rml_oob_send_buffer (rml_oob_send.c: > > 269) > > ==12342== by 0x7C9F3DF: allgather (grpcomm_bad_module.c:369) > > ==12342== by 0x7C9FD9E: modex (grpcomm_bad_module.c:497) > > ==12342== by 0x4E6DCAF: ompi_mpi_init (ompi_mpi_init.c:626) > > > > The full vg log is appended [2]. Of course, I could just suppress > > this error, but I get this for a lot (every?) MPI call which does > > communication, it seems (broadcasts, sends, recv's, allgathers, etc.). > > I'm worried a suppression would suppress too much / suppress an error > > I've caused. > > > > Have others seen this? Can I suppress perhaps from the > > orte_rml_oob_send_buffer down (safely)? > > > > -tom > > > > [1] configured via: gnu_pkg \ > > --enable-debug \ > > --enable-memchecker \ > > --disable-mpi-f77 \ > > --enable-pretty-print-stacktrace \ > > --enable-cxx-exceptions \ > > --enable-mpi-threads \ > > --with-valgrind=${PREFIX} \ > > --without-gm \ > > --without-mx \ > > --without-openib \ > > --without-psm \ > > --with-pic \ > > --with-gnu-ld > > where gnu_pkg is basically a function which calls configure with > > --prefix=${PREFIX}. > > > > [2] > > ==12342== Memcheck, a memory error detector. > > ==12342== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward > > et al. > > ==12342== Using LibVEX rev 1884, a library for dynamic binary > > translation. > > ==12342== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP. > > ==12342== Using valgrind-3.4.1, a dynamic binary instrumentation > > framework. > > ==12342== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward > > et al. > > ==12342== For more details, rerun with: -v > > ==12342== > > ==12342== My PID = 12342, parent PID = 12341. Prog and args are: > > ==12342== ./a.out > > ==12342== > > ==12342== Warning: client syscall munmap tried to modify addresses > > 0xffffffffffffffff-0xffe > > ==12342== Syscall param writev(vector[...]) points to uninitialised > > byte(s) > > ==12342== at 0x61DF733: writev (in /lib/libc-2.7.so) > > ==12342== by 0x7889AB9: mca_oob_tcp_msg_send_handler > > (oob_tcp_msg.c:265) > > ==12342== by 0x788B1A0: mca_oob_tcp_peer_send (oob_tcp_peer.c:197) > > ==12342== by 0x788FF2A: mca_oob_tcp_send_nb (oob_tcp_send.c:167) > > ==12342== by 0x767C7EC: orte_rml_oob_send (rml_oob_send.c:137) > > ==12342== by 0x767D19A: orte_rml_oob_send_buffer (rml_oob_send.c: > > 269) > > ==12342== by 0x7C9F3DF: allgather (grpcomm_bad_module.c:369) > > ==12342== by 0x7C9FD9E: modex (grpcomm_bad_module.c:497) > > ==12342== by 0x4E6DCAF: ompi_mpi_init (ompi_mpi_init.c:626) > > ==12342== by 0x4EAAC88: PMPI_Init (pinit.c:80) > > ==12342== by 0x400857: main (hello.c:5) > > ==12342== Address 0x677697b is 107 bytes inside a block of size 256 > > alloc'd > > ==12342== at 0x4C22A51: realloc (vg_replace_malloc.c:429) > > ==12342== by 0x53DCBE0: opal_dss_buffer_extend > > (dss_internal_functions.c:63) > > ==12342== by 0x53DE4BA: opal_dss_copy_payload (dss_load_unload.c: > > 164) > > ==12342== by 0x7C9F314: allgather (grpcomm_bad_module.c:363) > > ==12342== by 0x7C9FD9E: modex (grpcomm_bad_module.c:497) > > ==12342== by 0x4E6DCAF: ompi_mpi_init (ompi_mpi_init.c:626) > > ==12342== by 0x4EAAC88: PMPI_Init (pinit.c:80) > > ==12342== by 0x400857: main (hello.c:5) > > ==12342== Uninitialised value was created by a stack allocation > > ==12342== at 0x53FFA60: opal_ifinit (if.c:147) > > { > > <insert a suppression name here> > > Memcheck:Param > > writev(vector[...]) > > fun:writev > > fun:mca_oob_tcp_msg_send_handler > > fun:mca_oob_tcp_peer_send > > fun:mca_oob_tcp_send_nb > > fun:orte_rml_oob_send > > fun:orte_rml_oob_send_buffer > > fun:allgather > > fun:modex > > fun:ompi_mpi_init > > fun:PMPI_Init > > fun:main > > } > > ==12342== > > ==12342== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 307 > > from 3) > > ==12342== malloc/free: in use at exit: 204,012 bytes in 2,022 blocks. > > ==12342== malloc/free: 10,382 allocs, 8,360 frees, 14,603,162 bytes > > allocated. > > ==12342== For a detailed leak analysis, rerun with: --leak-check=yes > > ==12342== For counts of detected errors, rerun with: -v > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users