Re: [OMPI users] 2 to 1 oversubscription

Nifty Tom Mitchell Thu, 6 Aug 2009 18:36:21 -0400

On Mon, Jul 13, 2009 at 01:24:54PM -0400, Mark Borgerding wrote:
> 
> Here's my advice: Don't trust anyones advice.  Benchmark it yourself and  
> see.
>
> The problems vary so wildly that only you can tell if your problem will  
> benefit from over-subscription. It really depends on too many factors to  
> accurately predict: schedulers, memory usage, network/interconnect  
> hardware, disk seek times, and probably a hundred other things.
>
> I've even seen mixed results from oversubscribing within a single  
> algorithm.  (Granted this is mostly with the older generation  
> hyperthreading, so I'm not sure how things fare with nehalem).  The most  
> notable effect I've observed is related to cache use. If the problem  
> fits in cache it is much faster.  With cores sharing cache it can even  
> be advantageous to *undersubscribe* the problem.  i.e. schedule 2  
> processes on a quad core so each can have the full cache.


Mark's advice - stellar- "Benchmark it yourself and see".

I suspect that a number of interesting things are hidden under
hyperthreading.
        - application chunk sizing.
        - application chunk symmetry.
        - cache interactions.
        - cache line conflicts
        - MPI primitives
        - MPI message rate interactions
        - MPI bandwidth interactions
        - MPI latency interactions
        - barrier code used in MPI primitives
        - mutex code 
        - Communication hardware interactions
        - Compiler optimizations
        - Compiler pipelineing
        - Compiler flags
        - Compiler loop unrolling
        - Compiler SIMD instruction use.
        - Compiler intrinsic
        - Library selection and implementation
        - System API choice.
        - hardware pipeline use while hyperthreading is active.
        - etc.

A naive view of Intel Hyperthreading transistor counts 
makes it economical to share some pipelines between
two execution streams.  In old code and common application
mixes fully replicated hardware would be idle a lot 
of the time.

At Pathscale MPI benchmarks between the in-house compiler and
other modern optimizing compilers were not done with hyperthreading
enabled because it was routinely slower on interesting benchmarks 
(and it required a BIOS change).   YMMV, What is interesting to 
you might be different so try it.

AND at a system level hyperthreading is very interesting 
because of stuff like IO, X and numerous kernel tasks 
do not need or touch the big blocks of shared transistors that
are the floating point hardware.



-- 
        T o m  M i t c h e l l 
        Found me a new hat, now what?

Re: [OMPI users] 2 to 1 oversubscription

Reply via email to