Eugene Loh wrote:
ompi_info -a | grep eager
depends on the BTL.  E.g., sm=4K but tcp is 64K.  self is 128K.

Thanks, Eugene.

On the other hand, I assume the memory imbalance we're talking about is rather severe. Much more than 2500 bytes to be noticeable, I would think. Is that really the situation you're imagining?
The memory imbalance is drastic. I'm expecting 2 GB of memory use per process. The heaving processes (13/16) use the expected amount of memory; the remainder (3/16) misbehaving processes use more than twice as much memory. The specifics vary from run to run of course. So, yes, there is gigs of unexpected memory use to track down.

Umm, how big of a message imbalance do you think you might have? (The inflection in my voice doesn't come out well in e-mail.) Anyhow, that sounds like, um, "lots" of 2500-byte messages.

The message imbalance could be very large. Each process is running pretty close to its memory capacity. If a backlog of messages causes a buffer to grow to the point where the process starts swapping, it will very quickly fall very far behind. There are some billion 25-byte operations being sent in total or tens of millions MPI_Send messages (at 100 operations per MPI_Send).

Cheers,
Shaun

Reply via email to