On 06/30/2016 02:55 PM, Orion Poplawski wrote:
> valgrind output:
> 
> $ valgrind mpiexec -n 6 ./testphdf5
> ==8518== Memcheck, a memory error detector
> ==8518== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> ==8518== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
> ==8518== Command: mpiexec -n 6 ./testphdf5
> ==8518==
> ==8518== Conditional jump or move depends on uninitialised value(s)
> ==8518==    at 0x401C724: index (in /usr/lib/ld-2.23.90.so)
> ==8518==
> ==8518== Conditional jump or move depends on uninitialised value(s)
> ==8518==    at 0x401C728: index (in /usr/lib/ld-2.23.90.so)
> ==8518==
> ==8518== Conditional jump or move depends on uninitialised value(s)
> ==8518==    at 0x4008C04: fillin_rpath (in /usr/lib/ld-2.23.90.so)
> ==8518==    by 0x4009423: _dl_init_paths (in /usr/lib/ld-2.23.90.so)
> ==8518==
> ==8518== Conditional jump or move depends on uninitialised value(s)
> ==8518==    at 0x4016D48: dl_open_worker (in /usr/lib/ld-2.23.90.so)
> ==8518==
> ==8518== Conditional jump or move depends on uninitialised value(s)
> ==8518==    at 0x4009858: _dl_map_object (in /usr/lib/ld-2.23.90.so)
> ==8518==    by 0x4016DA3: dl_open_worker (in /usr/lib/ld-2.23.90.so)
> ==8518==
> ==8518== Invalid read of size 4
> ==8518==    at 0x401C724: index (in /usr/lib/ld-2.23.90.so)
> ==8518==  Address 0x4d1b7bc is 1 bytes after a block of size 43 alloc'd
> ==8518==    at 0x4849584: malloc (vg_replace_malloc.c:299)
> ==8518==    by 0x4BCB75F: __vasprintf_chk (in /usr/lib/libc-2.23.90.so)
> ==8518==    by 0x4BCB633: __asprintf_chk (in /usr/lib/libc-2.23.90.so)
> ==8518==    by 0x49393E3: UnknownInlinedFun (stdio2.h:178)
> ==8518==    by 0x49393E3: dlopen_open (dl_dlopen_module.c:77)
> ==8518==    by 0x491B22B: open_component (mca_base_component_find.c:558)
> ==8518==    by 0x491C6C3: find_dyn_components (mca_base_component_find.c:446)
> ==8518==    by 0x491C6C3: mca_base_component_find 
> (mca_base_component_find.c:190)
> ==8518==    by 0x4926D5F: mca_base_framework_components_register
> (mca_base_components_register.c:57)
> ==8518==    by 0x4927253: mca_base_framework_register 
> (mca_base_framework.c:115)
> ==8518==    by 0x49272BB: mca_base_framework_open (mca_base_framework.c:134)
> ==8518==    by 0x48735D3: orte_init (orte_init.c:128)
> ==8518==    by 0x10C3F3: orterun (orterun.c:908)
> ==8518==    by 0x10B25F: main (main.c:13)
> ==8518==
> 
> I think this is mainly harmless.  Or at least not in openmpi.
> 
> Then:
> 
> aborting MPI processes
> [arm03-packager00.cloud.fedoraproject.org:08518] 4 more processes have sent
> help message help-mpi-api.txt / mpi-abort
> [arm03-packager00.cloud.fedoraproject.org:08518] Set MCA parameter
> "orte_base_help_aggregate" to 0 to see all help / error messages
> ==8518== Syscall param write(buf) points to uninitialised byte(s)
> ==8518==    at 0x4ABA888: write (in /usr/lib/libpthread-2.23.90.so)
> ==8518==    by 0x50FAC9B: component_shutdown (oob_tcp_component.c:658)
> ==8518==    by 0x48A9F67: orte_oob_base_close (oob_base_frame.c:73)
> ==8518==    by 0x49273EF: mca_base_framework_close (mca_base_framework.c:198)
> ==8518==    by 0x50BC647: rte_finalize (ess_hnp_module.c:882)
> ==8518==    by 0x4873433: orte_finalize (orte_finalize.c:65)
> ==8518==    by 0x10D257: orterun (orterun.c:1151)
> ==8518==    by 0x10B25F: main (main.c:13)
> ==8518==  Address 0xbd828898 is on thread 1's stack
> ==8518==  in frame #1, created by component_shutdown (oob_tcp_component.c:647)
> ==8518==
> ==8518==
> ==8518== HEAP SUMMARY:
> ==8518==     in use at exit: 244,487 bytes in 773 blocks
> ==8518==   total heap usage: 14,898 allocs, 14,125 frees, 4,150,667 bytes
> allocated
> ==8518==
> ==8518== LEAK SUMMARY:
> ==8518==    definitely lost: 33,337 bytes in 23 blocks
> ==8518==    indirectly lost: 130,972 bytes in 20 blocks
> ==8518==      possibly lost: 2,368 bytes in 32 blocks
> ==8518==    still reachable: 77,810 bytes in 698 blocks
> ==8518==         suppressed: 0 bytes in 0 blocks
> ==8518== Rerun with --leak-check=full to see details of leaked memory
> ==8518==
> ==8518== For counts of detected and suppressed errors, rerun with: -v
> ==8518== Use --track-origins=yes to see where uninitialised values come from
> ==8518== ERROR SUMMARY: 310 errors from 8 contexts (suppressed: 0 from 0)
> 

But I should note that this process exits fine.  Seems like some kind of race
or otherwise sensitive to external conditions.


-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA, Boulder/CoRA Office             FAX: 303-415-9702
3380 Mitchell Lane                       or...@nwra.com
Boulder, CO 80301                   http://www.nwra.com

Reply via email to