Justin, can you try setting mpi_leave_pinned to 0 to disable libptmalloc2 and confirm this is related to ptmalloc ?

Thanks,
Sylvain

On 06/19/2017 03:05 PM, Justin Luitjens wrote:

I have an application that works on other systems but on the current system I’m running I’m seeing the following crash:

[dt04:22457] *** Process received signal ***

[dt04:22457] Signal: Segmentation fault (11)

[dt04:22457] Signal code: Address not mapped (1)

[dt04:22457] Failing at address: 0x55556a1da250

[dt04:22457] [ 0] /lib64/libpthread.so.0(+0xf370)[0x2aaaab353370]

[dt04:22457] [ 1] /home/jluitjens/libs/openmpi/lib/libopen-pal.so.13(opal_memory_ptmalloc2_int_free+0x50)[0x2aaaacbcf810]

[dt04:22457] [ 2] /home/jluitjens/libs/openmpi/lib/libopen-pal.so.13(opal_memory_ptmalloc2_free+0x9b)[0x2aaaacbcff3b]

[dt04:22457] [ 3] ./hacc_tpm[0x42f068]

[dt04:22457] [ 4] ./hacc_tpm[0x42f231]

[dt04:22457] [ 5] ./hacc_tpm[0x40f64d]

[dt04:22457] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaac30db35]

[dt04:22457] [ 7] ./hacc_tpm[0x4115cf]

[dt04:22457] *** End of error message ***

This app is a CUDA app but doesn’t use GPU direct so that should be irrelevant.

I’m building with ggc/5.3.0  cuda/8.0.44 openmpi/1.10.7

I’m using this on centos 7 and am using a vanilla MPI configure line: ./configure --prefix=/home/jluitjens/libs/openmpi/

Currently I’m trying to do this with just a single MPI process but multiple MPI processes fail in the same way:

mpirun  --oversubscribe -np 1 ./command

What is odd is the crash occurs around the same spot in the code but not consistently at the same spot. The spot in the code where the single thread is at the time of the crash is nowhere near MPI code. The code where it is crashing is just using malloc to allocate some memory. This makes me think the crash is due to a thread outside of the application I’m working on (perhaps in OpenMPI itself) or perhaps due to openmpi hijacking malloc/free.

Does anyone have any ideas of what I could try to work around this issue?

Thanks,

Justin

------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
------------------------------------------------------------------------


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to