On Fri, 12 Jul 2013, Jeff Squyres (jsquyres) wrote:
...
In short: doing 1 memcopy consumes half the memory bandwidth of 2 mem copies. So when you have lots of MPI processes competing for memory bandwidth, it turns out that having each MPI process use half the bandwidth is a Really Good Idea. :-) This allows more MPI processes to do shared memory communications before you hit the memory bandwidth bottleneck.

Hi Jeff,

Lots of useful detail in there - thanks. We have plenty of memory-bound applications in use, so hopefully there's some good news in this.

I was hoping that someone might have some examples of real application behaviour rather than micro benchmarks. It can be crazy hard to get that information from users.

Unusually for us, we're putting in a second cluster with the same architecture, CPUs, memory and OS as the last one. I might be able to use this as a bigger stick to get some better feedback. If so, I'll pass it on.

Darius Buntinas, Brice Goglin, et al. wrote an excellent paper about exactly this set of issues; see http://runtime.bordeaux.inria.fr/knem/.
...

I'll definitely take a look - thanks again.

All the best,

Mark
--
-----------------------------------------------------------------
Mark Dixon                       Email    : m.c.di...@leeds.ac.uk
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------

Reply via email to