Hi, We recently ran into deadlock when calling MPI_gatherv with Open MPI 1.3.4. It seems to have something to do with sm at first. However, it still hangs even after turning off sm btl.
Any idea how to track down the problem? Thanks, Teng ################################################# Stack trace for master node ################################################# mca_btl_sm_component_progress opal_progress opal_condition_wait ompi_request_default_wait_all ompi_coll_tuned_sendrecv_actual ompi_coll_tuned_barrier_intra_two_procs ompi_coll_tuned_barrier_intra_dec_fixed mca_coll_sync_gatherv PMPI_Gatherv ################################################# Stack trace for slave node ################################################# mca_btl_sm_component_progress opal_progress opal_condition_wait ompi_request_wait_completion mca_pml_ob1_recv mca_coll_basic_gatherv_intra mca_coll_sync_gatherv ################################################# Message queue from totalview ################################################ MPI_COMM_WORLD Comm_size 2 Comm_rank 0 Pending receives : none Unexpected messages : no information available Pending sends : none MPI_COMM_SELF Comm_size 1 Comm_rank 0 Pending receives : none Unexpected messages : no information available Pending sends : none MPI_COMM_NULL Comm_size 0 Comm_rank -2 Pending receives : none Unexpected messages : no information available Pending sends : none MPI COMMUNICATOR 3 DUP FROM 0 Comm_size 2 Comm_rank 0 Pending receives : none Unexpected messages : no information available Pending sends : none MPI COMMUNICATOR 4 SPLIT FROM 3 Comm_size 2 Comm_rank 0 Pending receives : none Unexpected messages : no information available Pending sends : none MPI COMMUNICATOR 5 SPLIT FROM 4 Comm_size 2 Comm_rank 0 Pending receives : none Unexpected messages : no information available Pending sends : none MPI COMMUNICATOR 6 SPLIT FROM 4 Comm_size 1 Comm_rank 0 Pending receives : none Unexpected messages : no information available Pending sends : none MPI COMMUNICATOR 7 DUP FROM 4 Comm_size 2 Comm_rank 0 Pending receives [0] Receive: 0x80b9000 Data: 1 * MPI_CHAR Status Pending Source 0 (orterun<xxxx>.0) Tag 7 (0x00000007) User Buffer 0xb06fa010 -> 0x00000000 (0) Buffer Length 1359312 (0x0014bdd0) [1] Receive: 0x80b9200 Data: 1 * MPI_CHAR Status Pending Source 0 (orterun<xxxx>.0) Tag 5 (0x00000005) User Buffer 0xb0c2a010 -> 0x00000000 (0) Buffer Length 1359312 (0x0014bdd0) [2] Receive: 0x80b9400 Data: 1 * MPI_CHAR Status Pending Source 1 (orterun<xxxx>.1) Tag 3 (0x00000003) User Buffer 0xb115a010 -> 0xc0ef9e79 (-1058038151) Buffer Length 1359312 (0x0014bdd0) [3] Receive: 0x80b9600 Data: 1 * MPI_CHAR Status Pending Source 1 (orterun<xxxx>.1) Tag 1 (0x00000001) User Buffer 0xb168a010 -> 0xc0c662aa (-1060740438) Buffer Length 1359312 (0x0014bdd0) [4] Receive: 0x82a2500 Data: 1 * MPI_CHAR Status Pending Source 0 (orterun<xxxx>.0) Tag 11 (0x0000000b) User Buffer 0xafc9a010 -> 0x00000000 (0) Buffer Length 1359312 (0x0014bdd0) [5] Receive: 0x82a2700 Data: 1 * MPI_CHAR Status Pending Source 0 (orterun<xxxx>.0) Tag 9 (0x00000009) User Buffer 0xb01ca010 -> 0x00000000 (0) Buffer Length 1359312 (0x0014bdd0) Unexpected messages : no information available Pending sends [0] Send: 0x80b8500 Data transfer completed Status Complete Target 0 (orterun<xxxx>.0) Tag 4 (0x00000004) Buffer 0xb0846010 -> 0x40544279 (1079263865) Buffer Length 2548 (0x000009f4) [1] Send: 0x80b8780 Data transfer completed Status Complete Target 0 (orterun<xxxx>.0) Tag 6 (0x00000006) Buffer 0xb0d76010 -> 0x41a756bf (1101485759) Buffer Length 2992 (0x00000bb0) [2] Send: 0x80b8a00 Data transfer completed Status Complete Target 1 (orterun<xxxx>.1) Tag 0 (0x00000000) Buffer 0xb12a6010 -> 0xbf94cfc4 (-1080766524) Buffer Length 3856 (0x00000f10) [3] Send: 0x80b8c80 Data transfer completed Status Complete Target 1 (orterun<xxxx>.1) Tag 2 (0x00000002) Buffer 0xb17d6010 -> 0x400a1a6c (1074403948) Buffer Length 3952 (0x00000f70) [4] Send: 0x831f080 Data transfer completed Status Complete Target 0 (orterun<xxxx>.0) Tag 8 (0x00000008) Buffer 0xafde6010 -> 0xc0de2c50 (-1059181488) Buffer Length 3292 (0x00000cdc) [5] Send: 0x831f300 Data transfer completed Status Complete Target 0 (orterun<xxxx>.0) Tag 10 (0x0000000a) Buffer 0xb0316010 -> 0x41169ca7 (1092000935) Buffer Length 3232 (0x00000ca0) MPI COMMUNICATOR 8 SPLIT FROM 5 Comm_size 2 Comm_rank 0 Pending receives : none Unexpected messages : no information available Pending sends : none MPI COMMUNICATOR 9 SPLIT FROM 5 Comm_size 2 Comm_rank 0 Pending receives : none Unexpected messages : no information available Pending sends : none
deadlock.log
Description: Binary data