Re: [OMPI users] OMPI users] MPI_Wait stalls

Gilles Gouaillardet Wed, 4 Nov 2015 08:34:43 -0500 (EST)

Abe-san,

you can be blocking on one side, and non blocking on the other side.
for example, one task can do MPI_Send, and the other MPI_Irecv and MPI_Wait.


in order to avoid deadlock, your program should do
1. master MPI_Isend and start the workers
2. worker receive and process messages (in there is one recv per thread,
you can do MPI_Recv e.g. blocking recv)
3. master MPI_Wait the request used in MPI_Isend
4. do simulation
I do not know if some kind of synchronization is required between master
and workers.
the key point is you MPI_Wait after the workers have been started.

I do not know all the details of your program, but you might avoid using
threads :
1. MPI_Isend
2. several MPI_Irecv
3. MPI_Waitall (or a loop with MPI_Waitany/MPI_Waitsome)
4. do simulation

if you really want threads, an other option is to start the worker after
MPI_Waitany/MPI_Waitsome

once again, I do not know your full program, so I can just guess.
you might also want to try an other MPI flavor (such as mpich), since your
program could be correct and the deadlock might be open MPI specific.

Cheers,

Gilles

On Wednesday, November 4, 2015, ABE Hiroshi <hab...@gmail.com> wrote:

> Dear Gilles-san and all,
>
> I thought MPI_Isend kept the sent data and stacked up in somewhere waiting
> corresponding MPI_Irecv.
> The image of my code regarding MPI,
> 1. send ALL tag-ed message to the other node (MPI_Isend) in master thread,
> then launch worker threads and
> 2. receive the corresponding tag-ed messages from the other node
> (MPI_Irecv ) in the worker threads.
> 3. do simulation
>
> Doesn’t it work? How silly I was. I coded several sample programs but I
> couldn’t find the problem.
>
> Okay, may I understand that both MPI_Send/Recv and MPI_Isend/Irecv must be
> called sequencially just like POP/PUSH with stack?
>
> With my simulation algorithms the order of send and receive messages
> cannot be in sequencial in the default way. In that case how do you build
> the MPI messaging. Should the order of the MPI messages send to the
> destination node at first?
>
> Thank you in advance for your suggestions.
>
> Sincerely
> Hiroshi ABE
>
> 2015/11/04 18:10、Gilles Gouaillardet <gilles.gouaillar...@gmail.com
> <javascript:;>> のメール：
>
> > Abe-san,
> >
> > MPI_Isend followed by MPI_Wait is equivalent to MPI_Send
> >
> > Depending on message size and inflight messages, that can deadlock if
> two tasks send to each other and no recv has been posted.
> >
> > Cheers,
> >
> > Gilles
> >
> > ABE Hiroshi <hab...@gmail.com <javascript:;>> wrote:
> >> Dear All,
> >>
> >> Installed openmpi 1.10.0 and gcc-5.2 using Fink (
> http://www.finkproject.org) but nothing is changed with my code.
> >>
> >> Regarding the MPI_Finalize error in my previous mail, it should be my
> fault. I had removed all mpi stuff in /usr/local/ manually and the
> openmpi-1.10.0 had been installed then the error message didn’t appear now.
> Maybe some old version of openmpi stuff still remained there.
> >>
> >> Anyway, I found the reason of my problem. The code is :
> >>
> >> void
> >> Block::MPISendEqualInterChangeData( DIRECTION dir, int rank, int id ) {
> >>
> >>   GetEqualInterChangeData( dir, cf[0] );
> >>
> >>   int N = GetNumGrid();
> >>   int nb = 6*N*N*1;
> >>   nb = 1010;
> >> //    float *buf = new float[ nb ];
> >>   float *buf = (float *)malloc( sizeof(float)*nb);
> >>   for( int i = 0; i < nb; i++ ) buf[i] = 0.0;
> >>
> >>   MPI_Request req;
> >>   MPI_Status  status;
> >>
> >>   int tag = 100 * id + (int)dir;
> >>
> >>   MPI_Isend( buf, nb, MPI_REAL4, rank, tag, MPI_COMM_WORLD, &req );
> >>   MPI_Wait( &req, &status );
> >>
> >> //    delete [] buf;
> >>   free( buf );
> >> }
> >>
> >> This works. If the “nb” value changes to more than “1010”, MPI_Wait
> will stall.
> >> This means the upper limit of MPI_Isend would be 4 x 1010 = 4040 bytes.
> >>
> >> If this is true, is there any way to increase this?. I guess this
> should be wrong and there should be something wrong with my system.
> >>
> >> Any idea and suggestions are really appreciated.
> >>
> >> Thank you.
> >>
> >> 2015/11/03 8:05、Jeff Squyres (jsquyres) <jsquy...@cisco.com
> <javascript:;>> のメール：
> >>
> >>> On Oct 29, 2015, at 10:24 PM, ABE Hiroshi <hab...@gmail.com
> <javascript:;>> wrote:
> >>>>
> >>>> Regarding my code I mentioned in my original mail, the behaviour is
> very weird. MPI_Isend is called from the different named function, it works.
> >>>> And I wrote a sample program to try to reproduce my problem but it
> works fine,  except the problem of MPI_Finalize.
> >>>>
> >>>> So I decided to make gcc-5.2 and make openmpi on it, which seems to
> be a recommendation of the FINK project.
> >>>
> >>> Ok.  Per the prior mail, if you can make a small reproducer, that
> would be most helpful in tracking down the issue.
> >>>
> >>> Thanks!
> >>
>
> ABE Hiroshi
>  from Tokorozawa, JAPAN
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org <javascript:;>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/11/27987.php

Re: [OMPI users] OMPI users] MPI_Wait stalls

Reply via email to