Justin, can you try setting mpi_leave_pinned to 0 to disable
libptmalloc2 and confirm this is related to ptmalloc ?
Thanks,
Sylvain
On 06/19/2017 03:05 PM, Justin Luitjens wrote:
I have an application that works on other systems but on the current
system I’m running I’m seeing the following crash:
[dt04:22457] *** Process received signal ***
[dt04:22457] Signal: Segmentation fault (11)
[dt04:22457] Signal code: Address not mapped (1)
[dt04:22457] Failing at address: 0x55556a1da250
[dt04:22457] [ 0] /lib64/libpthread.so.0(+0xf370)[0x2aaaab353370]
[dt04:22457] [ 1]
/home/jluitjens/libs/openmpi/lib/libopen-pal.so.13(opal_memory_ptmalloc2_int_free+0x50)[0x2aaaacbcf810]
[dt04:22457] [ 2]
/home/jluitjens/libs/openmpi/lib/libopen-pal.so.13(opal_memory_ptmalloc2_free+0x9b)[0x2aaaacbcff3b]
[dt04:22457] [ 3] ./hacc_tpm[0x42f068]
[dt04:22457] [ 4] ./hacc_tpm[0x42f231]
[dt04:22457] [ 5] ./hacc_tpm[0x40f64d]
[dt04:22457] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaac30db35]
[dt04:22457] [ 7] ./hacc_tpm[0x4115cf]
[dt04:22457] *** End of error message ***
This app is a CUDA app but doesn’t use GPU direct so that should be
irrelevant.
I’m building with ggc/5.3.0 cuda/8.0.44 openmpi/1.10.7
I’m using this on centos 7 and am using a vanilla MPI configure line:
./configure --prefix=/home/jluitjens/libs/openmpi/
Currently I’m trying to do this with just a single MPI process but
multiple MPI processes fail in the same way:
mpirun --oversubscribe -np 1 ./command
What is odd is the crash occurs around the same spot in the code but
not consistently at the same spot. The spot in the code where the
single thread is at the time of the crash is nowhere near MPI code.
The code where it is crashing is just using malloc to allocate some
memory. This makes me think the crash is due to a thread outside of
the application I’m working on (perhaps in OpenMPI itself) or perhaps
due to openmpi hijacking malloc/free.
Does anyone have any ideas of what I could try to work around this issue?
Thanks,
Justin
------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s)
and may contain confidential information. Any unauthorized review,
use, disclosure or distribution is prohibited. If you are not the
intended recipient, please contact the sender by reply email and
destroy all copies of the original message.
------------------------------------------------------------------------
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users