How does the stack for the non-SM BTL run look, I assume it probably is
the same? Also, can you dump the message queues for rank 1? What's
interesting is you have a bunch of pending receives, do you expect that
to be the case when the MPI_Gatherv occurred?
--td
Teng Lin wrote:
Hi,
We recently ran into deadlock when calling MPI_gatherv with Open MPI 1.3.4. It
seems to have something to do with sm at first. However, it still hangs even
after turning off sm btl.
Any idea how to track down the problem?
Thanks,
Teng
#################################################
Stack trace for master node
#################################################
mca_btl_sm_component_progress
opal_progress
opal_condition_wait
ompi_request_default_wait_all
ompi_coll_tuned_sendrecv_actual
ompi_coll_tuned_barrier_intra_two_procs
ompi_coll_tuned_barrier_intra_dec_fixed
mca_coll_sync_gatherv
PMPI_Gatherv
#################################################
Stack trace for slave node
#################################################
mca_btl_sm_component_progress
opal_progress
opal_condition_wait
ompi_request_wait_completion
mca_pml_ob1_recv
mca_coll_basic_gatherv_intra
mca_coll_sync_gatherv
#################################################
Message queue from totalview
################################################
MPI_COMM_WORLD
Comm_size 2
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none
MPI_COMM_SELF
Comm_size 1
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none
MPI_COMM_NULL
Comm_size 0
Comm_rank -2
Pending receives : none
Unexpected messages : no information available
Pending sends : none
MPI COMMUNICATOR 3 DUP FROM 0
Comm_size 2
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none
MPI COMMUNICATOR 4 SPLIT FROM 3
Comm_size 2
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none
MPI COMMUNICATOR 5 SPLIT FROM 4
Comm_size 2
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none
MPI COMMUNICATOR 6 SPLIT FROM 4
Comm_size 1
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none
MPI COMMUNICATOR 7 DUP FROM 4
Comm_size 2
Comm_rank 0
Pending receives
[0]
Receive: 0x80b9000
Data: 1 * MPI_CHAR
Status Pending
Source 0 (orterun<xxxx>.0)
Tag 7 (0x00000007)
User Buffer 0xb06fa010 -> 0x00000000 (0)
Buffer Length 1359312 (0x0014bdd0)
[1]
Receive: 0x80b9200
Data: 1 * MPI_CHAR
Status Pending
Source 0 (orterun<xxxx>.0)
Tag 5 (0x00000005)
User Buffer 0xb0c2a010 -> 0x00000000 (0)
Buffer Length 1359312 (0x0014bdd0)
[2]
Receive: 0x80b9400
Data: 1 * MPI_CHAR
Status Pending
Source 1 (orterun<xxxx>.1)
Tag 3 (0x00000003)
User Buffer 0xb115a010 -> 0xc0ef9e79 (-1058038151)
Buffer Length 1359312 (0x0014bdd0)
[3]
Receive: 0x80b9600
Data: 1 * MPI_CHAR
Status Pending
Source 1 (orterun<xxxx>.1)
Tag 1 (0x00000001)
User Buffer 0xb168a010 -> 0xc0c662aa (-1060740438)
Buffer Length 1359312 (0x0014bdd0)
[4]
Receive: 0x82a2500
Data: 1 * MPI_CHAR
Status Pending
Source 0 (orterun<xxxx>.0)
Tag 11 (0x0000000b)
User Buffer 0xafc9a010 -> 0x00000000 (0)
Buffer Length 1359312 (0x0014bdd0)
[5]
Receive: 0x82a2700
Data: 1 * MPI_CHAR
Status Pending
Source 0 (orterun<xxxx>.0)
Tag 9 (0x00000009)
User Buffer 0xb01ca010 -> 0x00000000 (0)
Buffer Length 1359312 (0x0014bdd0)
Unexpected messages : no information available
Pending sends
[0]
Send: 0x80b8500
Data transfer completed
Status Complete
Target 0 (orterun<xxxx>.0)
Tag 4 (0x00000004)
Buffer 0xb0846010 -> 0x40544279 (1079263865)
Buffer Length 2548 (0x000009f4)
[1]
Send: 0x80b8780
Data transfer completed
Status Complete
Target 0 (orterun<xxxx>.0)
Tag 6 (0x00000006)
Buffer 0xb0d76010 -> 0x41a756bf (1101485759)
Buffer Length 2992 (0x00000bb0)
[2]
Send: 0x80b8a00
Data transfer completed
Status Complete
Target 1 (orterun<xxxx>.1)
Tag 0 (0x00000000)
Buffer 0xb12a6010 -> 0xbf94cfc4 (-1080766524)
Buffer Length 3856 (0x00000f10)
[3]
Send: 0x80b8c80
Data transfer completed
Status Complete
Target 1 (orterun<xxxx>.1)
Tag 2 (0x00000002)
Buffer 0xb17d6010 -> 0x400a1a6c (1074403948)
Buffer Length 3952 (0x00000f70)
[4]
Send: 0x831f080
Data transfer completed
Status Complete
Target 0 (orterun<xxxx>.0)
Tag 8 (0x00000008)
Buffer 0xafde6010 -> 0xc0de2c50 (-1059181488)
Buffer Length 3292 (0x00000cdc)
[5]
Send: 0x831f300
Data transfer completed
Status Complete
Target 0 (orterun<xxxx>.0)
Tag 10 (0x0000000a)
Buffer 0xb0316010 -> 0x41169ca7 (1092000935)
Buffer Length 3232 (0x00000ca0)
MPI COMMUNICATOR 8 SPLIT FROM 5
Comm_size 2
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none
MPI COMMUNICATOR 9 SPLIT FROM 5
Comm_size 2
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none
------------------------------------------------------------------------
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>