Dear Abe, Open MPI provides a simple way to validate your code against the eager problem, by forcing the library to use a 0 size eager (basically all messages are then matched). First, identify the networks used by your application and then set both btl_<network>_eager_limit and btl_<network>_rndv_eager_limit to 0 (via the MCA parameters or in the configuration file).
George. On Wed, Nov 4, 2015 at 7:30 PM, ABE Hiroshi <hab...@gmail.com> wrote: > Dear Dr. Bosilca and Dr. Gouaillardet, > > Thank you for your kind mail. I believe I could configure the problem. > > As is described in Dr. Boslica’s mail, this should be the eager problem. > In order to avoid that we should take one of the methods which are > suggested in Dr. Gouaillardet’s mail. > > Also I suppose to try MPICH but our code should work on both of the most > popular MPI implementations. > > Again, thank you very much for your kind helps. > > 2015/11/05 0:36、George Bosilca <bosi...@icl.utk.edu> のメール: > > A reproducer without the receiver part limited usability. > > 1) Have you checked that your code doesn't suffer from the eager problem? > It might happen that if your message size is under the eager limit, your > perception is that the code works when in fact your message is just on the > unexpected queue on the receiver, and will potentially never be matched. On > the opposite, when the length of the message is larger than the eager size > (which is network dependent), the code will stall obviously in MPI_Wait as > the send is never matched. The latter is the expected and defined behavior > based on the MPI standard. > > 2) In order to rule this out add a lock around your sends to make sure > that 1) a sequential version of the code is valid; and 2) that we are not > facing some consistent thread interleaving issues. If this step > successfully complete, then we can start looking deeper in the OMPI > internals for a bug. > > George. > > > On Wed, Nov 4, 2015 at 12:34 AM, ABE Hiroshi <hab...@gmail.com> wrote: > > [snip] > > Abe-san, > > you can be blocking on one side, and non blocking on the other side. > for example, one task can do MPI_Send, and the other MPI_Irecv and > MPI_Wait. > > in order to avoid deadlock, your program should do > 1. master MPI_Isend and start the workers > 2. worker receive and process messages (in there is one recv per thread, > you can do MPI_Recv e.g. blocking recv) > 3. master MPI_Wait the request used in MPI_Isend > 4. do simulation > I do not know if some kind of synchronization is required between master > and workers. > the key point is you MPI_Wait after the workers have been started. > > I do not know all the details of your program, but you might avoid using > threads : > 1. MPI_Isend > 2. several MPI_Irecv > 3. MPI_Waitall (or a loop with MPI_Waitany/MPI_Waitsome) > 4. do simulation > > if you really want threads, an other option is to start the worker after > MPI_Waitany/MPI_Waitsome > > once again, I do not know your full program, so I can just guess. > you might also want to try an other MPI flavor (such as mpich), since your > program could be correct and the deadlock might be open MPI specific. > > > ABE Hiroshi > Three Wells, JAPAN > http://www.3wells-computing.com/ > > > > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/11/27996.php >