Eugene Loh wrote:
ompi_info -a | grep eager
depends on the BTL. E.g., sm=4K but tcp is 64K. self is 128K.
Thanks, Eugene.
On the other hand, I assume the memory imbalance we're talking about
is rather severe. Much more than 2500 bytes to be noticeable, I
would think. Is that really the situation you're imagining?
The memory imbalance is drastic. I'm expecting 2 GB of memory use per
process. The heaving processes (13/16) use the expected amount of
memory; the remainder (3/16) misbehaving processes use more than twice
as much memory. The specifics vary from run to run of course. So, yes,
there is gigs of unexpected memory use to track down.
Umm, how big of a message imbalance do you think you might have? (The
inflection in my voice doesn't come out well in e-mail.) Anyhow, that
sounds like, um, "lots" of 2500-byte messages.
The message imbalance could be very large. Each process is running
pretty close to its memory capacity. If a backlog of messages causes a
buffer to grow to the point where the process starts swapping, it will
very quickly fall very far behind. There are some billion 25-byte
operations being sent in total or tens of millions MPI_Send messages
(at 100 operations per MPI_Send).
Cheers,
Shaun