I'm curious what causes the hump in the pingpong bandwidth curve when running on shared memory. Here's an example running on a fairly antiquated single-socket 4 core laptop with linux (2.6.32 kernel). Is this a cache effect? Something in OpenMPI itself, or a combination?
[Macintosh HD:Users:up:Pictures:bandwidth_onepair_onenode.png] Pete