Re: [OMPI users] OMPI users] MPI_Wait stalls

ABE Hiroshi Wed, 4 Nov 2015 07:54:59 -0500 (EST)

Dear Gilles-san and all,

I thought MPI_Isend kept the sent data and stacked up in somewhere waiting 
corresponding MPI_Irecv.
The image of my code regarding MPI,
1. send ALL tag-ed message to the other node (MPI_Isend) in master thread, then 
launch worker threads and
2. receive the corresponding tag-ed messages from the other node (MPI_Irecv ) 
in the worker threads.
3. do simulation


Doesn’t it work? How silly I was. I coded several sample programs but I 
couldn’t find the problem.

Okay, may I understand that both MPI_Send/Recv and MPI_Isend/Irecv must be 
called sequencially just like POP/PUSH with stack?

With my simulation algorithms the order of send and receive messages cannot be 
in sequencial in the default way. In that case how do you build the MPI 
messaging. Should the order of the MPI messages send to the destination node at 
first?

Thank you in advance for your suggestions.

Sincerely
Hiroshi ABE

2015/11/04 18:10、Gilles Gouaillardet <gilles.gouaillar...@gmail.com> のメール：

> Abe-san,
> 
> MPI_Isend followed by MPI_Wait is equivalent to MPI_Send
> 
> Depending on message size and inflight messages, that can deadlock if two 
> tasks send to each other and no recv has been posted.
> 
> Cheers,
> 
> Gilles
> 
> ABE Hiroshi <hab...@gmail.com> wrote:
>> Dear All,
>> 
>> Installed openmpi 1.10.0 and gcc-5.2 using Fink (http://www.finkproject.org) 
>> but nothing is changed with my code.
>> 
>> Regarding the MPI_Finalize error in my previous mail, it should be my fault. 
>> I had removed all mpi stuff in /usr/local/ manually and the openmpi-1.10.0 
>> had been installed then the error message didn’t appear now. Maybe some old 
>> version of openmpi stuff still remained there.
>> 
>> Anyway, I found the reason of my problem. The code is :
>> 
>> void
>> Block::MPISendEqualInterChangeData( DIRECTION dir, int rank, int id ) {
>> 
>>   GetEqualInterChangeData( dir, cf[0] );
>> 
>>   int N = GetNumGrid();
>>   int nb = 6*N*N*1;
>>   nb = 1010;
>> //    float *buf = new float[ nb ];
>>   float *buf = (float *)malloc( sizeof(float)*nb);
>>   for( int i = 0; i < nb; i++ ) buf[i] = 0.0;
>> 
>>   MPI_Request req;
>>   MPI_Status  status;
>> 
>>   int tag = 100 * id + (int)dir;
>> 
>>   MPI_Isend( buf, nb, MPI_REAL4, rank, tag, MPI_COMM_WORLD, &req );
>>   MPI_Wait( &req, &status );
>> 
>> //    delete [] buf;
>>   free( buf );
>> }
>> 
>> This works. If the “nb” value changes to more than “1010”, MPI_Wait will 
>> stall.
>> This means the upper limit of MPI_Isend would be 4 x 1010 = 4040 bytes.
>> 
>> If this is true, is there any way to increase this?. I guess this should be 
>> wrong and there should be something wrong with my system. 
>> 
>> Any idea and suggestions are really appreciated.
>> 
>> Thank you.
>> 
>> 2015/11/03 8:05、Jeff Squyres (jsquyres) <jsquy...@cisco.com> のメール：
>> 
>>> On Oct 29, 2015, at 10:24 PM, ABE Hiroshi <hab...@gmail.com> wrote:
>>>> 
>>>> Regarding my code I mentioned in my original mail, the behaviour is very 
>>>> weird. MPI_Isend is called from the different named function, it works.
>>>> And I wrote a sample program to try to reproduce my problem but it works 
>>>> fine,  except the problem of MPI_Finalize.
>>>> 
>>>> So I decided to make gcc-5.2 and make openmpi on it, which seems to be a 
>>>> recommendation of the FINK project.
>>> 
>>> Ok.  Per the prior mail, if you can make a small reproducer, that would be 
>>> most helpful in tracking down the issue.
>>> 
>>> Thanks!
>> 

ABE Hiroshi
 from Tokorozawa, JAPAN

Re: [OMPI users] OMPI users] MPI_Wait stalls

Reply via email to