Eugene Loh wrote:
I'm no expert, but I think it's something like this:

1) If the messages are short, they're sent over to the receiver. If the receiver does not expect them (no MPI_Irecv posted), it buffers them up.

2) If the messages are long, only a little bit is sent over to the receiver. The receiver will take in that little bit, but until an MPI_Irecv is posted it will not signal the sender that any more can be sent.

Are these messages being sent over TCP between nodes?  How long are they?

Each message is 2500 bytes. In this particular case, there are 8 processes on one host and 8 more processes on another host. So, on the same host the communication will be shared memory, and between hosts it will be TCP.

From your description, I'm guessing that either...
one process is falling behind the rest for whatever reason, and that it's buffering up received messages that haven't been handled by an MPI_Irecv.

or...
one process is falling behind and the other processes that have messages to send to it are being queued up in a transmit buffer.

Can statistics about the number of buffered messages (either tx or rx) be collected and reported by Open MPI? I suppose it would have to be a snapshot in time triggered either programatically or by a special kill signal, like SIGHUP or SIGUSR1.

Cheers,
Shaun

Reply via email to