Shaun Jackman wrote:

When running my Open MPI application, I'm seeing three processors that are using five times as much memory as the others when they should all use the same amount of memory. To start the debugging process, I would like to know if it's my application or the Open MPI library that's using the additional memory. Does anyone have any tips on calculating the amount of memory that Open MPI is using at a given moment in time?

My Open MPI application makes significant use of MPI_Irecv and MPI_Send. Every process has exactly one MPI_Irecv request active at any time. When it receives a message, it handles it, possibly transmits a response packet using MPI_Send, and starts a new MPI_Irecv request. What happens if one process is slow and lags behind? Will the messages be buffered at the sender or the receiver? Will the messages be buffered at the Open MPI level or at the OS level, say in a TCP transmit buffer or receive buffer?

I'm no expert, but I think it's something like this:

1) If the messages are short, they're sent over to the receiver. If the receiver does not expect them (no MPI_Irecv posted), it buffers them up.

2) If the messages are long, only a little bit is sent over to the receiver. The receiver will take in that little bit, but until an MPI_Irecv is posted it will not signal the sender that any more can be sent.

Are these messages being sent over TCP between nodes?  How long are they?

Reply via email to