On Fri, 12 Jul 2013, Jeff Squyres (jsquyres) wrote:
...
In short: doing 1 memcopy consumes half the memory bandwidth of 2 mem
copies. So when you have lots of MPI processes competing for memory
bandwidth, it turns out that having each MPI process use half the
bandwidth is a Really Good Idea. :-) This allows more MPI processes to
do shared memory communications before you hit the memory bandwidth
bottleneck.
Hi Jeff,
Lots of useful detail in there - thanks. We have plenty of memory-bound
applications in use, so hopefully there's some good news in this.
I was hoping that someone might have some examples of real application
behaviour rather than micro benchmarks. It can be crazy hard to get that
information from users.
Unusually for us, we're putting in a second cluster with the same
architecture, CPUs, memory and OS as the last one. I might be able to use
this as a bigger stick to get some better feedback. If so, I'll pass it
on.
Darius Buntinas, Brice Goglin, et al. wrote an excellent paper about
exactly this set of issues; see http://runtime.bordeaux.inria.fr/knem/.
...
I'll definitely take a look - thanks again.
All the best,
Mark
--
-----------------------------------------------------------------
Mark Dixon Email : m.c.di...@leeds.ac.uk
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------