On 06/30/2016 02:55 PM, Orion Poplawski wrote: > valgrind output: > > $ valgrind mpiexec -n 6 ./testphdf5 > ==8518== Memcheck, a memory error detector > ==8518== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. > ==8518== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info > ==8518== Command: mpiexec -n 6 ./testphdf5 > ==8518== > ==8518== Conditional jump or move depends on uninitialised value(s) > ==8518== at 0x401C724: index (in /usr/lib/ld-2.23.90.so) > ==8518== > ==8518== Conditional jump or move depends on uninitialised value(s) > ==8518== at 0x401C728: index (in /usr/lib/ld-2.23.90.so) > ==8518== > ==8518== Conditional jump or move depends on uninitialised value(s) > ==8518== at 0x4008C04: fillin_rpath (in /usr/lib/ld-2.23.90.so) > ==8518== by 0x4009423: _dl_init_paths (in /usr/lib/ld-2.23.90.so) > ==8518== > ==8518== Conditional jump or move depends on uninitialised value(s) > ==8518== at 0x4016D48: dl_open_worker (in /usr/lib/ld-2.23.90.so) > ==8518== > ==8518== Conditional jump or move depends on uninitialised value(s) > ==8518== at 0x4009858: _dl_map_object (in /usr/lib/ld-2.23.90.so) > ==8518== by 0x4016DA3: dl_open_worker (in /usr/lib/ld-2.23.90.so) > ==8518== > ==8518== Invalid read of size 4 > ==8518== at 0x401C724: index (in /usr/lib/ld-2.23.90.so) > ==8518== Address 0x4d1b7bc is 1 bytes after a block of size 43 alloc'd > ==8518== at 0x4849584: malloc (vg_replace_malloc.c:299) > ==8518== by 0x4BCB75F: __vasprintf_chk (in /usr/lib/libc-2.23.90.so) > ==8518== by 0x4BCB633: __asprintf_chk (in /usr/lib/libc-2.23.90.so) > ==8518== by 0x49393E3: UnknownInlinedFun (stdio2.h:178) > ==8518== by 0x49393E3: dlopen_open (dl_dlopen_module.c:77) > ==8518== by 0x491B22B: open_component (mca_base_component_find.c:558) > ==8518== by 0x491C6C3: find_dyn_components (mca_base_component_find.c:446) > ==8518== by 0x491C6C3: mca_base_component_find > (mca_base_component_find.c:190) > ==8518== by 0x4926D5F: mca_base_framework_components_register > (mca_base_components_register.c:57) > ==8518== by 0x4927253: mca_base_framework_register > (mca_base_framework.c:115) > ==8518== by 0x49272BB: mca_base_framework_open (mca_base_framework.c:134) > ==8518== by 0x48735D3: orte_init (orte_init.c:128) > ==8518== by 0x10C3F3: orterun (orterun.c:908) > ==8518== by 0x10B25F: main (main.c:13) > ==8518== > > I think this is mainly harmless. Or at least not in openmpi. > > Then: > > aborting MPI processes > [arm03-packager00.cloud.fedoraproject.org:08518] 4 more processes have sent > help message help-mpi-api.txt / mpi-abort > [arm03-packager00.cloud.fedoraproject.org:08518] Set MCA parameter > "orte_base_help_aggregate" to 0 to see all help / error messages > ==8518== Syscall param write(buf) points to uninitialised byte(s) > ==8518== at 0x4ABA888: write (in /usr/lib/libpthread-2.23.90.so) > ==8518== by 0x50FAC9B: component_shutdown (oob_tcp_component.c:658) > ==8518== by 0x48A9F67: orte_oob_base_close (oob_base_frame.c:73) > ==8518== by 0x49273EF: mca_base_framework_close (mca_base_framework.c:198) > ==8518== by 0x50BC647: rte_finalize (ess_hnp_module.c:882) > ==8518== by 0x4873433: orte_finalize (orte_finalize.c:65) > ==8518== by 0x10D257: orterun (orterun.c:1151) > ==8518== by 0x10B25F: main (main.c:13) > ==8518== Address 0xbd828898 is on thread 1's stack > ==8518== in frame #1, created by component_shutdown (oob_tcp_component.c:647) > ==8518== > ==8518== > ==8518== HEAP SUMMARY: > ==8518== in use at exit: 244,487 bytes in 773 blocks > ==8518== total heap usage: 14,898 allocs, 14,125 frees, 4,150,667 bytes > allocated > ==8518== > ==8518== LEAK SUMMARY: > ==8518== definitely lost: 33,337 bytes in 23 blocks > ==8518== indirectly lost: 130,972 bytes in 20 blocks > ==8518== possibly lost: 2,368 bytes in 32 blocks > ==8518== still reachable: 77,810 bytes in 698 blocks > ==8518== suppressed: 0 bytes in 0 blocks > ==8518== Rerun with --leak-check=full to see details of leaked memory > ==8518== > ==8518== For counts of detected and suppressed errors, rerun with: -v > ==8518== Use --track-origins=yes to see where uninitialised values come from > ==8518== ERROR SUMMARY: 310 errors from 8 contexts (suppressed: 0 from 0) >
But I should note that this process exits fine. Seems like some kind of race or otherwise sensitive to external conditions. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane or...@nwra.com Boulder, CO 80301 http://www.nwra.com