Re: [OMPI users] gathering problem

George Bosilca Fri, 9 Nov 2012 18:02:16 -0500

Gary,

Looking at the trace it become clear that the culprit is the receiving process 
and not the sending one.


Let me walk you over. I split the trace in several groups. The first one will 
look at the sender (proc39) and the receiver (proc28) for the first message.

PROC39: 59.2268409810; SEND_REQ_ACTIVATE, count=81536, peer=28
PROC39: 59.2268649811; SEND_REQ_XFER_BEGIN, count=81536, peer=28
PROC39: 59.2270019811; SEND_REQ_XFER_CONTINUE, count=27178, peer=28
PROC39: 59.2270509811; SEND_REQ_XFER_CONTINUE, count=54358, peer=28
PROC39: 59.2270809811; SEND_REQ_XFER_END, count=81536, peer=28
PROC39: 59.2270869811; SEND_REQ_COMPLETE, count=81536, peer=28
PROC39: 59.2270939811; SEND_REQ_NOTIFY, count=81536, peer=28

PROC28: 59.2271299611; RECV_MSG_ARRIVED, count=0, peer=39
PROC28: 59.2271389612; RECV_SEARCH_POSTED_Q_BEGIN, count=0, peer=39
PROC28: 59.2271439611; RECV_REQ_REMOVE_FROM_POSTED_Q, count=172084, peer=39
PROC28: 59.2271479612; RECV_MSG_MATCH_POSTED_REQ, count=172084, peer=39
PROC28: 59.2271519612; RECV_SEARCH_POSTED_Q_END, count=0, peer=39
PROC28: 59.2271559612; RECV_REQ_XFER_BEGIN, count=172084, peer=39
PROC28: 59.2271609611; RECV_REQ_XFER_CONTINUE, count=27178, peer=39
PROC28: 59.2271759611; RECV_REQ_XFER_CONTINUE, count=54358, peer=39
PROC28: 59.4331819612; RECV_REQ_XFER_END, count=172084, peer=39
PROC28: 59.4332009612; RECV_REQ_COMPLETE, count=172084, peer=39

You can see that the sender send the matching information (27178 bytes line 2). 
This message arrives on the receiver at line 1. There seems to be a small time 
drift between the two, so don't use the absolute time to compare. The first 
piece of data (aka 27178 bytes) is then copied on the receiver memory (line 7). 
As the pipeline protocol is on, the receiver ask for the second piece of data 
right after (line 8). The receiver will detect the completion of the last piece 
of data transfer at line 9, which is about 1.8 ms later.

On the sender side, don't take in account the completion time, it is 
irrelevant, as it is only the local reception (and thus barely indicate the 
cost of the local memcpy).

Now on to the second communication:

PROC39: 59.2283769811; SEND_REQ_ACTIVATE, count=81536, peer=28
PROC39: 59.2283959811; SEND_REQ_XFER_BEGIN, count=81536, peer=28
PROC39: 59.4331379811; SEND_REQ_XFER_CONTINUE, count=27178, peer=28
PROC39: 59.4332049811; SEND_REQ_XFER_CONTINUE, count=54358, peer=28
PROC39: 59.4332569811; SEND_REQ_XFER_END, count=81536, peer=28
PROC39: 59.4332659811; SEND_REQ_COMPLETE, count=81536, peer=28
PROC39: 59.4332769811; SEND_REQ_NOTIFY, count=81536, peer=28

PROC28: 59.4332059611; RECV_MSG_ARRIVED, count=0, peer=39
PROC28: 59.4332109612; RECV_SEARCH_POSTED_Q_BEGIN, count=0, peer=39
PROC28: 59.4332149612; RECV_MSG_INSERT_IN_UNEX_Q, count=0, peer=39
PROC28: 59.4332199611; RECV_SEARCH_POSTED_Q_END, count=0, peer=39
PROC28: 59.4332269612; RECV_REQ_NOTIFY, count=172084, peer=39
PROC28: 59.4332539611; RECV_REQ_ACTIVATE, count=172084, peer=39
PROC28: 59.4332589612; RECV_SEARCH_UNEX_Q_BEGIN, count=172084, peer=39
PROC28: 59.4332629612; RECV_REQ_MATCH_UNEX, count=172084, peer=39
PROC28: 59.4332669611; RECV_MSG_REMOVE_FROM_UNEX_Q, count=0, peer=39
PROC28: 59.4332719612; RECV_SEARCH_UNEX_Q_END, count=172084, peer=39
PROC28: 59.4332759612; RECV_REQ_XFER_BEGIN, count=172084, peer=39
PROC28: 59.4332809611; RECV_REQ_XFER_CONTINUE, count=27178, peer=39

Here the things are getting interesting. You can see that the matching 
information of the second send (line 1 on the receiving side) is received 
before the first receive is reported back to the MPI layer (line 5). In other 
words, the second receive is __unexpected__ (line 3 clearly highlight this). 
Later on when the MPI irecv is posted, it is matched (line 8) and then the 
confirmation is sent to the sender at line 11. Take a look at the time, this 
explains why the second send look so slow.

If there is something that should be understood better is the lag on the 
receiver for the detection of the completion of the first operation (as 
highlighted by the following two lines):

PROC28: 59.2271759611; RECV_REQ_XFER_CONTINUE, count=54358, peer=39
PROC28: 59.4331819612; RECV_REQ_XFER_END, count=172084, peer=39

  george.


On Nov 9, 2012, at 12:36 , "Hodge, Gary C" <gary.c.ho...@lmco.com> wrote:

> Answering my own question, I have downloaded openMPI 1.6.2 and still get the 
> delay in the MPI_Send.  Previously, I was using openMPI 1.4.1
>  
> I configured 1.6.2 with –enable-peruse and have implemented PERUSE callbacks.
> Here is a trace of gb2 (PROC39) sending messages to ob (PROC28)
>  
> PROC39: 59.2268409810; SEND_REQ_ACTIVATE, count=81536, peer=28
> PROC39: 59.2268649811; SEND_REQ_XFER_BEGIN, count=81536, peer=28
> PROC39: 59.2270019811; SEND_REQ_XFER_CONTINUE, count=27178, peer=28
> PROC39: 59.2270509811; SEND_REQ_XFER_CONTINUE, count=54358, peer=28
> PROC39: 59.2270809811; SEND_REQ_XFER_END, count=81536, peer=28
> PROC39: 59.2270869811; SEND_REQ_COMPLETE, count=81536, peer=28
> PROC39: 59.2270939811; SEND_REQ_NOTIFY, count=81536, peer=28
> PROC28: 59.2271299611; RECV_MSG_ARRIVED, count=0, peer=39
> PROC28: 59.2271389612; RECV_SEARCH_POSTED_Q_BEGIN, count=0, peer=39
> PROC28: 59.2271439611; RECV_REQ_REMOVE_FROM_POSTED_Q, count=172084, peer=39
> PROC28: 59.2271479612; RECV_MSG_MATCH_POSTED_REQ, count=172084, peer=39
> PROC28: 59.2271519612; RECV_SEARCH_POSTED_Q_END, count=0, peer=39
> PROC28: 59.2271559612; RECV_REQ_XFER_BEGIN, count=172084, peer=39
> PROC28: 59.2271609611; RECV_REQ_XFER_CONTINUE, count=27178, peer=39
> PROC28: 59.2271759611; RECV_REQ_XFER_CONTINUE, count=54358, peer=39
> PROC39: 59.2283769811; SEND_REQ_ACTIVATE, count=81536, peer=28
> PROC39: 59.2283959811; SEND_REQ_XFER_BEGIN, count=81536, peer=28
> PROC39: 59.4331379811; SEND_REQ_XFER_CONTINUE, count=27178, peer=28
> PROC28: 59.4331819612; RECV_REQ_XFER_END, count=172084, peer=39
> PROC28: 59.4332009612; RECV_REQ_COMPLETE, count=172084, peer=39
> PROC39: 59.4332049811; SEND_REQ_XFER_CONTINUE, count=54358, peer=28
> PROC28: 59.4332059611; RECV_MSG_ARRIVED, count=0, peer=39
> PROC28: 59.4332109612; RECV_SEARCH_POSTED_Q_BEGIN, count=0, peer=39
> PROC28: 59.4332149612; RECV_MSG_INSERT_IN_UNEX_Q, count=0, peer=39
> PROC28: 59.4332199611; RECV_SEARCH_POSTED_Q_END, count=0, peer=39
> PROC28: 59.4332269612; RECV_REQ_NOTIFY, count=172084, peer=39
> PROC28: 59.4332539611; RECV_REQ_ACTIVATE, count=172084, peer=39
> PROC39: 59.4332569811; SEND_REQ_XFER_END, count=81536, peer=28
> PROC28: 59.4332589612; RECV_SEARCH_UNEX_Q_BEGIN, count=172084, peer=39
> PROC28: 59.4332629612; RECV_REQ_MATCH_UNEX, count=172084, peer=39
> PROC39: 59.4332659811; SEND_REQ_COMPLETE, count=81536, peer=28
> PROC28: 59.4332669611; RECV_MSG_REMOVE_FROM_UNEX_Q, count=0, peer=39
> PROC28: 59.4332719612; RECV_SEARCH_UNEX_Q_END, count=172084, peer=39
> PROC28: 59.4332759612; RECV_REQ_XFER_BEGIN, count=172084, peer=39
> PROC39: 59.4332769811; SEND_REQ_NOTIFY, count=81536, peer=28
> PROC28: 59.4332809611; RECV_REQ_XFER_CONTINUE, count=27178, peer=39
> PROC39: 59.4332849811; ERROR: component gb2 exceeded send time limit by 
> 0.104915 seconds.
> PROC39: 59.4332849811; WARNING: component gb2 increased page faults 
> R=130898,0; P=130898,0; S=132207,0
>  
> Notice the 205 ms delay between the hi-lighted SEND_REQ_XFER_BEGIN and the 
> subsequent SEND_REQ_XFER_CONTINUE
> Also notice that there was no such delay in the previous send request.
>  
> The last two lines are my own debug that monitor excessive time spent in an 
> MPI_Send and monitor increasing number of page faults.
> After the delaying MPI_Send, the page faults have increased by 1309 (132207 – 
> 130898) !!!
>  
> I looked at the PML code (pml_ob1_sendreq.c) and I suspect that something is 
> going awry in the mca_bml_base_prepare_src function.
> I believe that this is the function in which the openib BTL makes calls to 
> register the user memory.
>  
> This delay in the MPI_Send kills any hope of meeting our real-time 
> requirements, so any help is greatly appreciated.
>  
>  
> From: Hodge, Gary C 
> Sent: Monday, November 05, 2012 12:27 PM
> To: us...@open-mpi.org
> Subject: gathering problem
>  
> I continue to have a problem where 2 processes are sending to the same 
> process and one of the sending processes hangs for 150 to 550 ms in the call 
> to MPI_Send.
>  
> Each process runs on a different node and the receiving process has posted an 
> MPI_Irecv 17 ms before the hanging send.
> The posted receives are for 172K buffers and the sending processes are 
> sending 81K size messages.
> I have set mpi_leave_pinned to 1 and have increased the 
> btl_openib_receive_queues to …:S,65536,512,256,64
>  
> How do I trace the various phases of message passing to diagnose where the 
> send is hanging up?
>  
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] gathering problem

Reply via email to