Abe-san, I am glad you were able to move forward.
btw, George has a Ph.D, but Sheldon Cooper would say about me I am only an engineer Cheers, Gilles On Saturday, November 7, 2015, ABE Hiroshi <hab...@gmail.com> wrote: > Dear Dr. Bosilca and All, > > Regarding my problem, MPI_Wait stall after MPI_Isend with large (over > 4kbytes) messages has been resolved by Dr. Gouaillardet’s suggestion : > > 1 MPI_Isend in the master thread > 2 Launch worker threads to receive the messages by MPI_Recv > 3. MPI_Waitall in the master thread. > > Thank you so much, and I will try the Dr. Bosica’s suggestion, it seems I > would need some investigation to understand the suggestion. But it is > interesting to me. > > Sincerely, > Hiroshi > > 2015/11/05 9:58、George Bosilca <bosi...@icl.utk.edu > <javascript:_e(%7B%7D,'cvml','bosi...@icl.utk.edu');>> のメール: > > Dear Abe, > > Open MPI provides a simple way to validate your code against the eager > problem, by forcing the library to use a 0 size eager (basically all > messages are then matched). First, identify the networks used by your > application and then set both btl_<network>_eager_limit > and btl_<network>_rndv_eager_limit to 0 (via the MCA parameters or in the > configuration file). > > George. > > > On Wed, Nov 4, 2015 at 7:30 PM, ABE Hiroshi <hab...@gmail.com > <javascript:_e(%7B%7D,'cvml','hab...@gmail.com');>> wrote: > >> Dear Dr. Bosilca and Dr. Gouaillardet, >> >> Thank you for your kind mail. I believe I could configure the problem. >> >> As is described in Dr. Boslica’s mail, this should be the eager problem. >> In order to avoid that we should take one of the methods which are >> suggested in Dr. Gouaillardet’s mail. >> >> Also I suppose to try MPICH but our code should work on both of the most >> popular MPI implementations. >> >> Again, thank you very much for your kind helps. >> >> 2015/11/05 0:36、George Bosilca <bosi...@icl.utk.edu >> <javascript:_e(%7B%7D,'cvml','bosi...@icl.utk.edu');>> のメール: >> >> A reproducer without the receiver part limited usability. >> >> 1) Have you checked that your code doesn't suffer from the eager problem? >> It might happen that if your message size is under the eager limit, your >> perception is that the code works when in fact your message is just on the >> unexpected queue on the receiver, and will potentially never be matched. On >> the opposite, when the length of the message is larger than the eager size >> (which is network dependent), the code will stall obviously in MPI_Wait as >> the send is never matched. The latter is the expected and defined behavior >> based on the MPI standard. >> >> 2) In order to rule this out add a lock around your sends to make sure >> that 1) a sequential version of the code is valid; and 2) that we are not >> facing some consistent thread interleaving issues. If this step >> successfully complete, then we can start looking deeper in the OMPI >> internals for a bug. >> >> George. >> >> >> On Wed, Nov 4, 2015 at 12:34 AM, ABE Hiroshi <hab...@gmail.com >> <javascript:_e(%7B%7D,'cvml','hab...@gmail.com');>> wrote: >> >> [snip] >> >> Abe-san, >> >> you can be blocking on one side, and non blocking on the other side. >> for example, one task can do MPI_Send, and the other MPI_Irecv and >> MPI_Wait. >> >> in order to avoid deadlock, your program should do >> 1. master MPI_Isend and start the workers >> 2. worker receive and process messages (in there is one recv per thread, >> you can do MPI_Recv e.g. blocking recv) >> 3. master MPI_Wait the request used in MPI_Isend >> 4. do simulation >> I do not know if some kind of synchronization is required between master >> and workers. >> the key point is you MPI_Wait after the workers have been started. >> >> I do not know all the details of your program, but you might avoid using >> threads : >> 1. MPI_Isend >> 2. several MPI_Irecv >> 3. MPI_Waitall (or a loop with MPI_Waitany/MPI_Waitsome) >> 4. do simulation >> >> if you really want threads, an other option is to start the worker after >> MPI_Waitany/MPI_Waitsome >> >> once again, I do not know your full program, so I can just guess. >> you might also want to try an other MPI flavor (such as mpich), since >> your program could be correct and the deadlock might be open MPI specific. >> >> > ABE Hiroshi > Three Wells, JAPAN > http://www.3wells-computing.com/ > > > > > > >