Shaun Jackman wrote:

Eugene Loh wrote:

On the other hand, I assume the memory imbalance we're talking about is rather severe. Much more than 2500 bytes to be noticeable, I would think. Is that really the situation you're imagining?

The memory imbalance is drastic. I'm expecting 2 GB of memory use per process. The heaving processes (13/16) use the expected amount of memory; the remainder (3/16) misbehaving processes use more than twice as much memory. The specifics vary from run to run of course. So, yes, there is gigs of unexpected memory use to track down.

Umm, how big of a message imbalance do you think you might have? (The inflection in my voice doesn't come out well in e-mail.) Anyhow, that sounds like, um, "lots" of 2500-byte messages.

The message imbalance could be very large. Each process is running pretty close to its memory capacity. If a backlog of messages causes a buffer to grow to the point where the process starts swapping, it will very quickly fall very far behind. There are some billion 25-byte operations being sent in total or tens of millions MPI_Send messages (at 100 operations per MPI_Send).

Okay. Attached is a "little" note I wrote up illustrating memory profiling with Sun tools. (It's "big" because I ended up including a few screenshots.) The program has a bunch of one-way message traffic and some user-code memory allocation. I then rerun with the receiver sleeping before jumping into action. The messages back up and OMPI ends up allocating a bunch of memory. The tools show you who (user or OMPI) is allocating how much memory and how big of a message backlog develops and how the sender starts stalling out (which is a good thing!). Anyhow, a useful exercise for me and hopefully helpful for you.

Attachment: memory-profiling.tar.gz
Description: CPIO file

Reply via email to