Can you provide a small chunk of code that replicates the problem, perchance?
On Apr 27, 2010, at 9:22 AM, Terry Dontje wrote: > How does the stack for the non-SM BTL run look, I assume it probably is the > same? Also, can you dump the message queues for rank 1? What's interesting > is you have a bunch of pending receives, do you expect that to be the case > when the MPI_Gatherv occurred? > > --td > > Teng Lin wrote: >> Hi, >> >> We recently ran into deadlock when calling MPI_gatherv with Open MPI 1.3.4. >> It seems to have something to do with sm at first. However, it still hangs >> even after turning off sm btl. >> >> Any idea how to track down the problem? >> >> Thanks, >> Teng >> >> ################################################# >> Stack trace for master node >> ################################################# >> mca_btl_sm_component_progress >> opal_progress >> opal_condition_wait >> ompi_request_default_wait_all >> ompi_coll_tuned_sendrecv_actual >> ompi_coll_tuned_barrier_intra_two_procs >> ompi_coll_tuned_barrier_intra_dec_fixed >> mca_coll_sync_gatherv >> PMPI_Gatherv >> >> >> ################################################# >> Stack trace for slave node >> ################################################# >> mca_btl_sm_component_progress >> opal_progress >> opal_condition_wait >> ompi_request_wait_completion >> mca_pml_ob1_recv >> mca_coll_basic_gatherv_intra >> mca_coll_sync_gatherv >> >> >> ################################################# >> Message queue from totalview >> ################################################ >> MPI_COMM_WORLD >> Comm_size 2 >> Comm_rank 0 >> Pending receives : none >> Unexpected messages : no information available >> Pending sends : none >> >> MPI_COMM_SELF >> Comm_size 1 >> Comm_rank 0 >> Pending receives : none >> Unexpected messages : no information available >> Pending sends : none >> >> MPI_COMM_NULL >> Comm_size 0 >> Comm_rank -2 >> Pending receives : none >> Unexpected messages : no information available >> Pending sends : none >> >> MPI COMMUNICATOR 3 DUP FROM 0 >> Comm_size 2 >> Comm_rank 0 >> Pending receives : none >> Unexpected messages : no information available >> Pending sends : none >> >> MPI COMMUNICATOR 4 SPLIT FROM 3 >> Comm_size 2 >> Comm_rank 0 >> Pending receives : none >> Unexpected messages : no information available >> Pending sends : none >> >> MPI COMMUNICATOR 5 SPLIT FROM 4 >> Comm_size 2 >> Comm_rank 0 >> Pending receives : none >> Unexpected messages : no information available >> Pending sends : none >> >> MPI COMMUNICATOR 6 SPLIT FROM 4 >> Comm_size 1 >> Comm_rank 0 >> Pending receives : none >> Unexpected messages : no information available >> Pending sends : none >> >> MPI COMMUNICATOR 7 DUP FROM 4 >> Comm_size 2 >> Comm_rank 0 >> Pending receives >> [0] >> Receive: 0x80b9000 >> Data: 1 * MPI_CHAR >> Status Pending >> Source 0 (orterun<xxxx>.0) >> Tag 7 (0x00000007) >> User Buffer 0xb06fa010 -> 0x00000000 (0) >> Buffer Length 1359312 (0x0014bdd0) >> [1] >> Receive: 0x80b9200 >> Data: 1 * MPI_CHAR >> Status Pending >> Source 0 (orterun<xxxx>.0) >> Tag 5 (0x00000005) >> User Buffer 0xb0c2a010 -> 0x00000000 (0) >> Buffer Length 1359312 (0x0014bdd0) >> [2] >> Receive: 0x80b9400 >> Data: 1 * MPI_CHAR >> Status Pending >> Source 1 (orterun<xxxx>.1) >> Tag 3 (0x00000003) >> User Buffer 0xb115a010 -> 0xc0ef9e79 (-1058038151) >> Buffer Length 1359312 (0x0014bdd0) >> [3] >> Receive: 0x80b9600 >> Data: 1 * MPI_CHAR >> Status Pending >> Source 1 (orterun<xxxx>.1) >> Tag 1 (0x00000001) >> User Buffer 0xb168a010 -> 0xc0c662aa (-1060740438) >> Buffer Length 1359312 (0x0014bdd0) >> [4] >> Receive: 0x82a2500 >> Data: 1 * MPI_CHAR >> Status Pending >> Source 0 (orterun<xxxx>.0) >> Tag 11 (0x0000000b) >> User Buffer 0xafc9a010 -> 0x00000000 (0) >> Buffer Length 1359312 (0x0014bdd0) >> [5] >> Receive: 0x82a2700 >> Data: 1 * MPI_CHAR >> Status Pending >> Source 0 (orterun<xxxx>.0) >> Tag 9 (0x00000009) >> User Buffer 0xb01ca010 -> 0x00000000 (0) >> Buffer Length 1359312 (0x0014bdd0) >> >> Unexpected messages : no information available >> Pending sends >> [0] >> Send: 0x80b8500 >> Data transfer completed >> Status Complete >> Target 0 (orterun<xxxx>.0) >> Tag 4 (0x00000004) >> Buffer 0xb0846010 -> 0x40544279 (1079263865) >> Buffer Length 2548 (0x000009f4) >> [1] >> Send: 0x80b8780 >> Data transfer completed >> Status Complete >> Target 0 (orterun<xxxx>.0) >> Tag 6 (0x00000006) >> Buffer 0xb0d76010 -> 0x41a756bf (1101485759) >> Buffer Length 2992 (0x00000bb0) >> [2] >> Send: 0x80b8a00 >> Data transfer completed >> Status Complete >> Target 1 (orterun<xxxx>.1) >> Tag 0 (0x00000000) >> Buffer 0xb12a6010 -> 0xbf94cfc4 (-1080766524) >> Buffer Length 3856 (0x00000f10) >> [3] >> Send: 0x80b8c80 >> Data transfer completed >> Status Complete >> Target 1 (orterun<xxxx>.1) >> Tag 2 (0x00000002) >> Buffer 0xb17d6010 -> 0x400a1a6c (1074403948) >> Buffer Length 3952 (0x00000f70) >> [4] >> Send: 0x831f080 >> Data transfer completed >> Status Complete >> Target 0 (orterun<xxxx>.0) >> Tag 8 (0x00000008) >> Buffer 0xafde6010 -> 0xc0de2c50 (-1059181488) >> Buffer Length 3292 (0x00000cdc) >> [5] >> Send: 0x831f300 >> Data transfer completed >> Status Complete >> Target 0 (orterun<xxxx>.0) >> Tag 10 (0x0000000a) >> Buffer 0xb0316010 -> 0x41169ca7 (1092000935) >> Buffer Length 3232 (0x00000ca0) >> >> MPI COMMUNICATOR 8 SPLIT FROM 5 >> Comm_size 2 >> Comm_rank 0 >> Pending receives : none >> Unexpected messages : no information available >> Pending sends : none >> >> MPI COMMUNICATOR 9 SPLIT FROM 5 >> Comm_size 2 >> Comm_rank 0 >> Pending receives : none >> Unexpected messages : no information available >> Pending sends : none >> >> >> >> >> >> _______________________________________________ >> users mailing list >> >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > <ATT17906524.gif> > Terry D. Dontje | Principal Software Engineer > Developer Tools Engineering | +1.650.633.7054 > Oracle - Performance Technologies > 95 Network Drive, Burlington, MA 01803 > Email terry.don...@oracle.com > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/