Can you provide a small chunk of code that replicates the problem, perchance?

On Apr 27, 2010, at 9:22 AM, Terry Dontje wrote:

> How does the stack for the non-SM BTL run look, I assume it probably is the 
> same?  Also, can you dump the message queues for rank 1?  What's interesting 
> is you have a bunch of pending receives, do you expect that to be the case 
> when the MPI_Gatherv occurred?
> 
> --td
> 
> Teng Lin wrote:
>> Hi,
>> 
>> We recently ran into deadlock when calling MPI_gatherv with Open MPI 1.3.4. 
>> It seems to have something to do with sm at first. However, it still hangs 
>> even after turning off sm btl.
>> 
>> Any idea how to track down the problem?
>> 
>> Thanks,
>> Teng
>> 
>> #################################################
>> Stack trace for master node
>> #################################################
>> mca_btl_sm_component_progress
>> opal_progress
>> opal_condition_wait
>> ompi_request_default_wait_all
>> ompi_coll_tuned_sendrecv_actual
>> ompi_coll_tuned_barrier_intra_two_procs
>> ompi_coll_tuned_barrier_intra_dec_fixed
>> mca_coll_sync_gatherv
>> PMPI_Gatherv
>> 
>> 
>> #################################################
>> Stack trace for slave node
>> #################################################
>> mca_btl_sm_component_progress
>> opal_progress
>> opal_condition_wait
>> ompi_request_wait_completion
>> mca_pml_ob1_recv
>> mca_coll_basic_gatherv_intra
>> mca_coll_sync_gatherv
>> 
>> 
>> #################################################
>> Message queue from totalview
>> ################################################
>> MPI_COMM_WORLD
>> Comm_size                2
>> Comm_rank                0
>> Pending receives    : none
>> Unexpected messages : no information available
>> Pending sends       : none
>> 
>> MPI_COMM_SELF
>> Comm_size                1
>> Comm_rank                0
>> Pending receives    : none
>> Unexpected messages : no information available
>> Pending sends       : none
>> 
>> MPI_COMM_NULL
>> Comm_size                0
>> Comm_rank               -2
>> Pending receives    : none
>> Unexpected messages : no information available
>> Pending sends       : none
>> 
>> MPI COMMUNICATOR 3 DUP FROM 0
>> Comm_size                2
>> Comm_rank                0
>> Pending receives    : none
>> Unexpected messages : no information available
>> Pending sends       : none
>> 
>> MPI COMMUNICATOR 4 SPLIT FROM 3
>> Comm_size                2
>> Comm_rank                0
>> Pending receives    : none
>> Unexpected messages : no information available
>> Pending sends       : none
>> 
>> MPI COMMUNICATOR 5 SPLIT FROM 4
>> Comm_size                2
>> Comm_rank                0
>> Pending receives    : none
>> Unexpected messages : no information available
>> Pending sends       : none
>> 
>> MPI COMMUNICATOR 6 SPLIT FROM 4
>> Comm_size                1
>> Comm_rank                0
>> Pending receives    : none
>> Unexpected messages : no information available
>> Pending sends       : none
>> 
>> MPI COMMUNICATOR 7 DUP FROM 4
>> Comm_size                2
>> Comm_rank                0
>> Pending receives   
>> [0]
>>    Receive: 0x80b9000
>>    Data: 1 * MPI_CHAR
>>    Status           Pending
>>    Source           0 (orterun<xxxx>.0)
>>    Tag              7 (0x00000007)
>>    User Buffer      0xb06fa010 -> 0x00000000 (0)
>>    Buffer Length    1359312 (0x0014bdd0)
>> [1]
>>    Receive: 0x80b9200
>>    Data: 1 * MPI_CHAR
>>    Status           Pending
>>    Source           0 (orterun<xxxx>.0)
>>    Tag              5 (0x00000005)
>>    User Buffer      0xb0c2a010 -> 0x00000000 (0)
>>    Buffer Length    1359312 (0x0014bdd0)
>> [2]
>>    Receive: 0x80b9400
>>    Data: 1 * MPI_CHAR
>>    Status           Pending
>>    Source           1 (orterun<xxxx>.1)
>>    Tag              3 (0x00000003)
>>    User Buffer      0xb115a010 -> 0xc0ef9e79 (-1058038151)
>>    Buffer Length    1359312 (0x0014bdd0)
>> [3]
>>    Receive: 0x80b9600
>>    Data: 1 * MPI_CHAR
>>    Status           Pending
>>    Source           1 (orterun<xxxx>.1)
>>    Tag              1 (0x00000001)
>>    User Buffer      0xb168a010 -> 0xc0c662aa (-1060740438)
>>    Buffer Length    1359312 (0x0014bdd0)
>> [4]
>>    Receive: 0x82a2500
>>    Data: 1 * MPI_CHAR
>>    Status           Pending
>>    Source           0 (orterun<xxxx>.0)
>>    Tag              11 (0x0000000b)
>>    User Buffer      0xafc9a010 -> 0x00000000 (0)
>>    Buffer Length    1359312 (0x0014bdd0)
>> [5]
>>    Receive: 0x82a2700
>>    Data: 1 * MPI_CHAR
>>    Status           Pending
>>    Source           0 (orterun<xxxx>.0)
>>    Tag              9 (0x00000009)
>>    User Buffer      0xb01ca010 -> 0x00000000 (0)
>>    Buffer Length    1359312 (0x0014bdd0)
>> 
>> Unexpected messages : no information available
>> Pending sends
>> [0]
>>    Send: 0x80b8500
>>    Data transfer completed
>>    Status           Complete
>>    Target           0 (orterun<xxxx>.0)
>>    Tag              4 (0x00000004)
>>    Buffer           0xb0846010 -> 0x40544279 (1079263865)
>>    Buffer Length    2548 (0x000009f4)
>> [1]
>>    Send: 0x80b8780
>>    Data transfer completed
>>    Status           Complete
>>    Target           0 (orterun<xxxx>.0)
>>    Tag              6 (0x00000006)
>>    Buffer           0xb0d76010 -> 0x41a756bf (1101485759)
>>    Buffer Length    2992 (0x00000bb0)
>> [2]
>>    Send: 0x80b8a00
>>    Data transfer completed
>>    Status           Complete
>>    Target           1 (orterun<xxxx>.1)
>>    Tag              0 (0x00000000)
>>    Buffer           0xb12a6010 -> 0xbf94cfc4 (-1080766524)
>>    Buffer Length    3856 (0x00000f10)
>> [3]
>>    Send: 0x80b8c80
>>    Data transfer completed
>>    Status           Complete
>>    Target           1 (orterun<xxxx>.1)
>>    Tag              2 (0x00000002)
>>    Buffer           0xb17d6010 -> 0x400a1a6c (1074403948)
>>    Buffer Length    3952 (0x00000f70)
>> [4]
>>    Send: 0x831f080
>>    Data transfer completed
>>    Status           Complete
>>    Target           0 (orterun<xxxx>.0)
>>    Tag              8 (0x00000008)
>>    Buffer           0xafde6010 -> 0xc0de2c50 (-1059181488)
>>    Buffer Length    3292 (0x00000cdc)
>> [5]
>>    Send: 0x831f300
>>    Data transfer completed
>>    Status           Complete
>>    Target           0 (orterun<xxxx>.0)
>>    Tag              10 (0x0000000a)
>>    Buffer           0xb0316010 -> 0x41169ca7 (1092000935)
>>    Buffer Length    3232 (0x00000ca0)
>> 
>> MPI COMMUNICATOR 8 SPLIT FROM 5
>> Comm_size                2
>> Comm_rank                0
>> Pending receives    : none
>> Unexpected messages : no information available
>> Pending sends       : none
>> 
>> MPI COMMUNICATOR 9 SPLIT FROM 5
>> Comm_size                2
>> Comm_rank                0
>> Pending receives    : none
>> Unexpected messages : no information available
>> Pending sends       : none
>> 
>> 
>>   
>> 
>> 
>> _______________________________________________
>> users mailing list
>> 
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> <ATT17906524.gif>
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.650.633.7054
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to