On Jan 21, 2016, at 7:40 AM, Eva <wuzh...@gmail.com> wrote:
> 
> Thanks Jeff.
> 
> >>1. Can you create a small example to reproduce the problem? 
> 
> >>2. The TCP and verbs-based transports use different thresholds and 
> >>protocols, and can sometimes bring to light errors in the application 
> >>(e.g., the application is making assumptions that just happen to be true 
> >>for TCP, but not necessarily for other transports). 
> 
> >>3. Is your program multi-threaded? If so, MPI_THREAD_MULTIPLE support in 
> >>the v1.8 and v1.10 series is not fully baked. 
> 
> >>4. Additionally, if you have buffering / matching / progression assumptions 
> >>in your application, you might accidentally block. An experiment to try to 
> >>is to convert all MPI_SEND and MPI_ISEND to MPI_SSEND and MPI_ISSEND, 
> >>respectively, and see if your program still functions properly on TCP. 
> 
> 1. I will try to create a mall example to reproduce the problem.
> 
> 2. I didn't get your point. I didn't make any assumptions for TCP. Is there 
> any difference in MPI for TCP and RDMA?

The way (Open) MPI communicates under the covers with TCP and other transports 
is different -- e.g., the amount of buffering is different, the eager sizes are 
different, etc.  Hence, if your application does an unsafe communication 
pattern (e.g., example 3.9 in MPI-3.1, page 43), it may coincidentally work on 
one transport and deadlock on another.

> 3. My program doesn't enable MPI_THREAD_MULTIPLE
> 
> 4. what do you mean by buffering / matching / progression assumptions in your 
> application?

Essentially the same thing I said above -- see example 3.9 in MPI-3.1.

> My program communicates like this:
> 
> 4 processes: process0, 1, 2, 3
> 
> process1/process3:
> 
>  foreach to_id in process0, process2:
> 
>        MPI_Send(send_buf, sendlen, to_id, TAG);
> 
>        MPI_Recv(recv_buf, recvlen, to_id, TAG);
> 
> 
> process0/process2:
> 
>       while(true):
> 
>            MPI_recv(recv_buf, any_source, TAG);
> 
>            MPI_Send(send_buf, source_id, TAG);

I'm afraid I can't grok what the problem would be from this; it's not enough of 
a summary describing what your application is doing.

If you can replicate the problem in a small example that you can share with the 
list, that would be most helpful.

Also, try the SEND -> SSEND and ISEND -> ISSEND experiment I mentioned in my 
previous mail.

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to