Hi Justin,

If you can build application in debug mode, try inserting valgrind into
your MPI command. It's usually very good in tracking down failing memory
allocations origins.

Kind regards,
- Dmitry.


2017-06-20 1:10 GMT+03:00 Sylvain Jeaugey <sjeau...@nvidia.com>:

> Justin, can you try setting mpi_leave_pinned to 0 to disable libptmalloc2
> and confirm this is related to ptmalloc ?
>
> Thanks,
> Sylvain
> On 06/19/2017 03:05 PM, Justin Luitjens wrote:
>
> I have an application that works on other systems but on the current
> system I’m running I’m seeing the following crash:
>
>
>
> [dt04:22457] *** Process received signal ***
>
> [dt04:22457] Signal: Segmentation fault (11)
>
> [dt04:22457] Signal code: Address not mapped (1)
>
> [dt04:22457] Failing at address: 0x55556a1da250
>
> [dt04:22457] [ 0] /lib64/libpthread.so.0(+0xf370)[0x2aaaab353370]
>
> [dt04:22457] [ 1] /home/jluitjens/libs/openmpi/lib/libopen-pal.so.13(opal_
> memory_ptmalloc2_int_free+0x50)[0x2aaaacbcf810]
>
> [dt04:22457] [ 2] /home/jluitjens/libs/openmpi/lib/libopen-pal.so.13(opal_
> memory_ptmalloc2_free+0x9b)[0x2aaaacbcff3b]
>
> [dt04:22457] [ 3] ./hacc_tpm[0x42f068]
>
> [dt04:22457] [ 4] ./hacc_tpm[0x42f231]
>
> [dt04:22457] [ 5] ./hacc_tpm[0x40f64d]
>
> [dt04:22457] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaac30db35]
>
> [dt04:22457] [ 7] ./hacc_tpm[0x4115cf]
>
> [dt04:22457] *** End of error message ***
>
>
>
>
>
> This app is a CUDA app but doesn’t use GPU direct so that should be
> irrelevant.
>
>
>
> I’m building with ggc/5.3.0  cuda/8.0.44  openmpi/1.10.7
>
>
>
> I’m using this on centos 7 and am using a vanilla MPI configure line:
> ./configure --prefix=/home/jluitjens/libs/openmpi/
>
>
>
> Currently I’m trying to do this with just a single MPI process but
> multiple MPI processes fail in the same way:
>
>
>
> mpirun  --oversubscribe -np 1 ./command
>
>
>
> What is odd is the crash occurs around the same spot in the code but not
> consistently at the same spot.  The spot in the code where the single
> thread is at the time of the crash is nowhere near MPI code.  The code
> where it is crashing is just using malloc to allocate some memory. This
> makes me think the crash is due to a thread outside of the application I’m
> working on (perhaps in OpenMPI itself) or perhaps due to openmpi hijacking
> malloc/free.
>
>
>
> Does anyone have any ideas of what I could try to work around this issue?
>
>
>
> Thanks,
>
> Justin
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>
>
> _______________________________________________
> users mailing 
> listus...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to