A reproducer without the receiver part limited usability. 1) Have you checked that your code doesn't suffer from the eager problem? It might happen that if your message size is under the eager limit, your perception is that the code works when in fact your message is just on the unexpected queue on the receiver, and will potentially never be matched. On the opposite, when the length of the message is larger than the eager size (which is network dependent), the code will stall obviously in MPI_Wait as the send is never matched. The latter is the expected and defined behavior based on the MPI standard.
2) In order to rule this out add a lock around your sends to make sure that 1) a sequential version of the code is valid; and 2) that we are not facing some consistent thread interleaving issues. If this step successfully complete, then we can start looking deeper in the OMPI internals for a bug. George. On Wed, Nov 4, 2015 at 12:34 AM, ABE Hiroshi <hab...@gmail.com> wrote: > Dear All, > > Installed openmpi 1.10.0 and gcc-5.2 using Fink ( > http://www.finkproject.org) but nothing is changed with my code. > > Regarding the MPI_Finalize error in my previous mail, it should be my > fault. I had removed all mpi stuff in /usr/local/ manually and the > openmpi-1.10.0 had been installed then the error message didn’t appear now. > Maybe some old version of openmpi stuff still remained there. > > Anyway, I found the reason of my problem. The code is : > > void > Block::MPISendEqualInterChangeData( DIRECTION dir, int rank, int id ) { > > GetEqualInterChangeData( dir, cf[0] ); > > int N = GetNumGrid(); > int nb = 6*N*N*1; > nb = 1010; > // float *buf = new float[ nb ]; > float *buf = (float *)malloc( sizeof(float)*nb); > for( int i = 0; i < nb; i++ ) buf[i] = 0.0; > > MPI_Request req; > MPI_Status status; > > int tag = 100 * id + (int)dir; > > MPI_Isend( buf, nb, MPI_REAL4, rank, tag, MPI_COMM_WORLD, &req ); > MPI_Wait( &req, &status ); > > // delete [] buf; > free( buf ); > } > > This works. If the “nb” value changes to more than “1010”, MPI_Wait will > stall. > This means the upper limit of MPI_Isend would be 4 x 1010 = 4040 bytes. > > If this is true, is there any way to increase this?. I guess this should > be wrong and there should be something wrong with my system. > > Any idea and suggestions are really appreciated. > > Thank you. > > 2015/11/03 8:05、Jeff Squyres (jsquyres) <jsquy...@cisco.com> のメール: > > > On Oct 29, 2015, at 10:24 PM, ABE Hiroshi <hab...@gmail.com> wrote: > >> > >> Regarding my code I mentioned in my original mail, the behaviour is > very weird. MPI_Isend is called from the different named function, it works. > >> And I wrote a sample program to try to reproduce my problem but it > works fine, except the problem of MPI_Finalize. > >> > >> So I decided to make gcc-5.2 and make openmpi on it, which seems to be > a recommendation of the FINK project. > > > > Ok. Per the prior mail, if you can make a small reproducer, that would > be most helpful in tracking down the issue. > > > > Thanks! > > ABE Hiroshi > from Tokorozawa, JAPAN > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/11/27985.php