Dear Dr. Bosilca and Dr. Gouaillardet,

Thank you for your kind mail. I believe I could configure the problem.

As is described in Dr. Boslica’s mail, this should be the eager problem. In 
order to avoid that we should take one of the methods which are suggested in 
Dr. Gouaillardet’s mail.

Also I suppose to try MPICH but our code should work on both of the most 
popular MPI implementations.

Again, thank you very much for your kind helps.

2015/11/05 0:36、George Bosilca <bosi...@icl.utk.edu> のメール:

> A reproducer without the receiver part limited usability. 
> 
> 1) Have you checked that your code doesn't suffer from the eager problem? It 
> might happen that if your message size is under the eager limit, your 
> perception is that the code works when in fact your message is just on the 
> unexpected queue on the receiver, and will potentially never be matched. On 
> the opposite, when the length of the message is larger than the eager size 
> (which is network dependent), the code will stall obviously in MPI_Wait as 
> the send is never matched. The latter is the expected and defined behavior 
> based on the MPI standard.
> 
> 2) In order to rule this out add a lock around your sends to make sure that 
> 1) a sequential version of the code is valid; and 2) that we are not facing 
> some consistent thread interleaving issues. If this step successfully 
> complete, then we can start looking deeper in the OMPI internals for a bug.
> 
>   George.
> 
> 
> On Wed, Nov 4, 2015 at 12:34 AM, ABE Hiroshi <hab...@gmail.com> wrote:
[snip]

> Abe-san,
> 
> you can be blocking on one side, and non blocking on the other side.
> for example, one task can do MPI_Send, and the other MPI_Irecv and MPI_Wait.
> 
> in order to avoid deadlock, your program should do
> 1. master MPI_Isend and start the workers
> 2. worker receive and process messages (in there is one recv per thread, you 
> can do MPI_Recv e.g. blocking recv)
> 3. master MPI_Wait the request used in MPI_Isend
> 4. do simulation
> I do not know if some kind of synchronization is required between master and 
> workers.
> the key point is you MPI_Wait after the workers have been started.
> 
> I do not know all the details of your program, but you might avoid using 
> threads :
> 1. MPI_Isend
> 2. several MPI_Irecv
> 3. MPI_Waitall (or a loop with MPI_Waitany/MPI_Waitsome)
> 4. do simulation
> 
> if you really want threads, an other option is to start the worker after 
> MPI_Waitany/MPI_Waitsome
> 
> once again, I do not know your full program, so I can just guess.
> you might also want to try an other MPI flavor (such as mpich), since your 
> program could be correct and the deadlock might be open MPI specific.

ABE Hiroshi
 Three Wells, JAPAN
 http://www.3wells-computing.com/






Reply via email to