Re: [OMPI users] MPI_Wait stalls

George Bosilca Wed, 4 Nov 2015 19:58:59 -0500 (EST)

Dear Abe,

Open MPI provides a simple way to validate your code against the eager
problem, by forcing the library to use a 0 size eager (basically all
messages are then matched). First, identify the networks used by your
application and then set both btl_<network>_eager_limit
and btl_<network>_rndv_eager_limit to 0 (via the MCA parameters or in the
configuration file).


  George.


On Wed, Nov 4, 2015 at 7:30 PM, ABE Hiroshi <hab...@gmail.com> wrote:

> Dear Dr. Bosilca and Dr. Gouaillardet,
>
> Thank you for your kind mail. I believe I could configure the problem.
>
> As is described in Dr. Boslica’s mail, this should be the eager problem.
> In order to avoid that we should take one of the methods which are
> suggested in Dr. Gouaillardet’s mail.
>
> Also I suppose to try MPICH but our code should work on both of the most
> popular MPI implementations.
>
> Again, thank you very much for your kind helps.
>
> 2015/11/05 0:36、George Bosilca <bosi...@icl.utk.edu> のメール：
>
> A reproducer without the receiver part limited usability.
>
> 1) Have you checked that your code doesn't suffer from the eager problem?
> It might happen that if your message size is under the eager limit, your
> perception is that the code works when in fact your message is just on the
> unexpected queue on the receiver, and will potentially never be matched. On
> the opposite, when the length of the message is larger than the eager size
> (which is network dependent), the code will stall obviously in MPI_Wait as
> the send is never matched. The latter is the expected and defined behavior
> based on the MPI standard.
>
> 2) In order to rule this out add a lock around your sends to make sure
> that 1) a sequential version of the code is valid; and 2) that we are not
> facing some consistent thread interleaving issues. If this step
> successfully complete, then we can start looking deeper in the OMPI
> internals for a bug.
>
>   George.
>
>
> On Wed, Nov 4, 2015 at 12:34 AM, ABE Hiroshi <hab...@gmail.com> wrote:
>
> [snip]
>
> Abe-san,
>
> you can be blocking on one side, and non blocking on the other side.
> for example, one task can do MPI_Send, and the other MPI_Irecv and
> MPI_Wait.
>
> in order to avoid deadlock, your program should do
> 1. master MPI_Isend and start the workers
> 2. worker receive and process messages (in there is one recv per thread,
> you can do MPI_Recv e.g. blocking recv)
> 3. master MPI_Wait the request used in MPI_Isend
> 4. do simulation
> I do not know if some kind of synchronization is required between master
> and workers.
> the key point is you MPI_Wait after the workers have been started.
>
> I do not know all the details of your program, but you might avoid using
> threads :
> 1. MPI_Isend
> 2. several MPI_Irecv
> 3. MPI_Waitall (or a loop with MPI_Waitany/MPI_Waitsome)
> 4. do simulation
>
> if you really want threads, an other option is to start the worker after
> MPI_Waitany/MPI_Waitsome
>
> once again, I do not know your full program, so I can just guess.
> you might also want to try an other MPI flavor (such as mpich), since your
> program could be correct and the deadlock might be open MPI specific.
>
>
> ABE Hiroshi
>  Three Wells, JAPAN
>  http://www.3wells-computing.com/
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/11/27996.php
>

Re: [OMPI users] MPI_Wait stalls

Reply via email to