Hi, I'm encountering some issues when running a multithreaded program with OpenMPI (trunk rev. 21380, configured with --enable-mpi-threads) My program (included in the tar.bz2) uses several pthreads that perform ping pongs concurrently (thread #1 uses tag #1, thread #2 uses tag #2, etc.) This program crashes over MX (either btl or mtl) with the following backtrace:
concurrent_ping_v2: pml_cm_recvreq.c:53: mca_pml_cm_recv_request_completion: Assertion `0 == ((mca_pml_cm_thin_recv_request_t*)base_request)->req_base.req_pml_complete' failed. [joe0:01709] *** Process received signal *** [joe0:01709] *** Process received signal *** [joe0:01709] Signal: Segmentation fault (11) [joe0:01709] Signal code: Address not mapped (1) [joe0:01709] Failing at address: 0x1238949c4 [joe0:01709] Signal: Aborted (6) [joe0:01709] Signal code: (-6) [joe0:01709] [ 0] /lib/libpthread.so.0 [0x7f57240be7b0] [joe0:01709] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7f5722cba065] [joe0:01709] [ 2] /lib/libc.so.6(abort+0x183) [0x7f5722cbd153] [joe0:01709] [ 3] /lib/libc.so.6(__assert_fail+0xe9) [0x7f5722cb3159] [joe0:01709] [ 0] /lib/libpthread.so.0 [0x7f57240be7b0] [joe0:01709] [ 1] /home/ftrahay/sources/openmpi/trunk/install//lib/libopen-pal.so.0 [0x7f57238d0a08] [joe0:01709] [ 2] /home/ftrahay/sources/openmpi/trunk/install//lib/libopen-pal.so.0 [0x7f57238cf8cc] [joe0:01709] [ 3]/home/ftrahay/sources/openmpi/trunk/install//lib/libopen-pal.so.0(opal_free+0x4e)
[0x7f57238bdc69] [joe0:01709] [ 4] /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_mtl_mx.so [0x7f572060b72f] [joe0:01709] [ 5]/home/ftrahay/sources/openmpi/trunk/install//lib/libopen-pal.so.0(opal_progress+0xbc)
[0x7f57238948e0] [joe0:01709] [ 6] /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so [0x7f572081145a] [joe0:01709] [ 7] /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so [0x7f57208113b7] [joe0:01709] [ 8] /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so [0x7f57208112e7] [joe0:01709] [ 9]/home/ftrahay/sources/openmpi/trunk/install//lib/libmpi.so.0(MPI_Recv+0x2bc)
[0x7f5723e07690] [joe0:01709] [10] ./concurrent_ping_v2(client+0x123) [0x401404] [joe0:01709] [11] /lib/libpthread.so.0 [0x7f57240b6faa] [joe0:01709] [12] /lib/libc.so.6(clone+0x6d) [0x7f5722d5629d] [joe0:01709] *** End of error message *** [joe0:01709] [ 4] /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so [0x7f57208120bb] [joe0:01709] [ 5] /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_mtl_mx.so [0x7f572060b80a] [joe0:01709] [ 6]/home/ftrahay/sources/openmpi/trunk/install//lib/libopen-pal.so.0(opal_progress+0xbc)
[0x7f57238948e0] [joe0:01709] [ 7] /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so [0x7f572081147a] [joe0:01709] [ 8] /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so [0x7f57208113b7] [joe0:01709] [ 9] /home/ftrahay/sources/openmpi/trunk/install/lib/openmpi/mca_pml_cm.so [0x7f57208112e7] [joe0:01709] [10]/home/ftrahay/sources/openmpi/trunk/install//lib/libmpi.so.0(MPI_Recv+0x2bc)
[0x7f5723e07690] [joe0:01709] [11] ./concurrent_ping_v2(client+0x123) [0x401404] [joe0:01709] [12] /lib/libpthread.so.0 [0x7f57240b6faa] [joe0:01709] [13] /lib/libc.so.6(clone+0x6d) [0x7f5722d5629d] [joe0:01709] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 1709 on node joe0 exited on signal 6 (Aborted). -------------------------------------------------------------------------- Any idea ? Francois Trahay
bug-report.tar.bz2
Description: application/bzip