Abe-san,

I am glad you were able to move forward.

btw, George has a Ph.D, but Sheldon Cooper would say about me I am only an
engineer

Cheers,

Gilles

On Saturday, November 7, 2015, ABE Hiroshi <hab...@gmail.com> wrote:

> Dear Dr. Bosilca and All,
>
> Regarding my problem, MPI_Wait stall after MPI_Isend with large (over
> 4kbytes) messages has been resolved by Dr. Gouaillardet’s suggestion :
>
> 1 MPI_Isend in the master thread
> 2 Launch worker threads to receive the messages by MPI_Recv
> 3. MPI_Waitall in the master thread.
>
> Thank you so much, and I will try the Dr. Bosica’s suggestion, it seems I
> would need some investigation to understand the suggestion. But it is
> interesting to me.
>
> Sincerely,
> Hiroshi
>
> 2015/11/05 9:58、George Bosilca <bosi...@icl.utk.edu
> <javascript:_e(%7B%7D,'cvml','bosi...@icl.utk.edu');>> のメール:
>
> Dear Abe,
>
> Open MPI provides a simple way to validate your code against the eager
> problem, by forcing the library to use a 0 size eager (basically all
> messages are then matched). First, identify the networks used by your
> application and then set both btl_<network>_eager_limit
> and btl_<network>_rndv_eager_limit to 0 (via the MCA parameters or in the
> configuration file).
>
>   George.
>
>
> On Wed, Nov 4, 2015 at 7:30 PM, ABE Hiroshi <hab...@gmail.com
> <javascript:_e(%7B%7D,'cvml','hab...@gmail.com');>> wrote:
>
>> Dear Dr. Bosilca and Dr. Gouaillardet,
>>
>> Thank you for your kind mail. I believe I could configure the problem.
>>
>> As is described in Dr. Boslica’s mail, this should be the eager problem.
>> In order to avoid that we should take one of the methods which are
>> suggested in Dr. Gouaillardet’s mail.
>>
>> Also I suppose to try MPICH but our code should work on both of the most
>> popular MPI implementations.
>>
>> Again, thank you very much for your kind helps.
>>
>> 2015/11/05 0:36、George Bosilca <bosi...@icl.utk.edu
>> <javascript:_e(%7B%7D,'cvml','bosi...@icl.utk.edu');>> のメール:
>>
>> A reproducer without the receiver part limited usability.
>>
>> 1) Have you checked that your code doesn't suffer from the eager problem?
>> It might happen that if your message size is under the eager limit, your
>> perception is that the code works when in fact your message is just on the
>> unexpected queue on the receiver, and will potentially never be matched. On
>> the opposite, when the length of the message is larger than the eager size
>> (which is network dependent), the code will stall obviously in MPI_Wait as
>> the send is never matched. The latter is the expected and defined behavior
>> based on the MPI standard.
>>
>> 2) In order to rule this out add a lock around your sends to make sure
>> that 1) a sequential version of the code is valid; and 2) that we are not
>> facing some consistent thread interleaving issues. If this step
>> successfully complete, then we can start looking deeper in the OMPI
>> internals for a bug.
>>
>>   George.
>>
>>
>> On Wed, Nov 4, 2015 at 12:34 AM, ABE Hiroshi <hab...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','hab...@gmail.com');>> wrote:
>>
>> [snip]
>>
>> Abe-san,
>>
>> you can be blocking on one side, and non blocking on the other side.
>> for example, one task can do MPI_Send, and the other MPI_Irecv and
>> MPI_Wait.
>>
>> in order to avoid deadlock, your program should do
>> 1. master MPI_Isend and start the workers
>> 2. worker receive and process messages (in there is one recv per thread,
>> you can do MPI_Recv e.g. blocking recv)
>> 3. master MPI_Wait the request used in MPI_Isend
>> 4. do simulation
>> I do not know if some kind of synchronization is required between master
>> and workers.
>> the key point is you MPI_Wait after the workers have been started.
>>
>> I do not know all the details of your program, but you might avoid using
>> threads :
>> 1. MPI_Isend
>> 2. several MPI_Irecv
>> 3. MPI_Waitall (or a loop with MPI_Waitany/MPI_Waitsome)
>> 4. do simulation
>>
>> if you really want threads, an other option is to start the worker after
>> MPI_Waitany/MPI_Waitsome
>>
>> once again, I do not know your full program, so I can just guess.
>> you might also want to try an other MPI flavor (such as mpich), since
>> your program could be correct and the deadlock might be open MPI specific.
>>
>>
> ABE Hiroshi
>  Three Wells, JAPAN
>  http://www.3wells-computing.com/
>
>
>
>
>
>
>

Reply via email to