Dennis Luxen wrote:

In MPI, you must complete every MPI_Isend by MPI_Wait on the request handle
(or a variant like MPI_Waitall or MPI_Test that returns TRUE).  An
un-completed MPI_Isend leaves resources tied up.

Good point, but that doesn't seem to help. I augmented each MPI_Isend with a MPI_Wait.

What does that mean? Does that mean you immediately followed each Isend with a Wait? Equivalently, did you replace each Isend with a Send?

In your original message, you said each process started by sending a 100K request. If that's the case, and you have blocking sends (or Isends augmented with Waits), you're not guaranteed progress. E.g., consider the last example in http://www.mpi-forum.org/docs/mpi-11-html/node41.html#Node41 . But your example code sends only single-int requests. So, this shouldn't be an issue for your sample code.

Anyhow, I ran your sample code and it hung. Then I replaced Isends with Sends and it ran. So, at that level, I am as yet unable to reproduce your problem.

Now, one process keeps hanging after a number of messages in MPI_Wait and the other one keeps MPI_Iprobe'ing for new messages to receive.

I do not know what symptom to expect from OpenMPI with this particular
application error but the one you describe is plausible.

If I start with the parameter "--mca btl tcp,self" on the other hand, the processes finish communication just fine. I am not exactly sure why this flag helps.

Reply via email to