Re: [OMPI users] Performance degradation of OpenMPI 1.10.2 when oversubscribed?

Jeff Hammond Mon, 27 Mar 2017 18:30:46 -0700

On Sat, Mar 25, 2017 at 7:15 AM Jeff Squyres (jsquyres) <jsquy...@cisco.com>
wrote:


> On Mar 25, 2017, at 3:04 AM, Ben Menadue <ben.mena...@nci.org.au> wrote:
> >
> > I’m not sure about this. It was my understanding that HyperThreading is
> implemented as a second set of e.g. registers that share execution units.
> There’s no division of the resources between the hardware threads, but
> rather the execution units switch between the two threads as they stall
> (e.g. cache miss, hazard/dependency, misprediction, …) — kind of like a
> context switch, but much cheaper. As long as there’s nothing being
> scheduled on the other hardware thread, there’s no impact on the
> performance. Moreover, turning HT off in the BIOS doesn’t make more
> resources available to now-single hardware thread.
>
> Here's an old post on this list where I cited a paper from the Intel
> Technology Journal.  The paper is pretty old at this point (2002, I
> believe?), but I believe it was published near the beginning of the HT
> technology at Intel:
>
>
> https://www.mail-archive.com/hwloc-users@lists.open-mpi.org/msg01135.html
>
> The paper is attached on that post; see, in particular, the section
> "Single-task and multi-task modes".
>
> All this being said, I'm a software wonk with a decent understanding of
> hardware.  But I don't closely follow all the specific details of all
> hardware.  So if Haswell / Broadwell / Skylake processors, for example, are
> substantially different than the HT architecture described in that paper,
> please feel free to correct me!
>

I don't know the details, but HPC centers like NERSC noticed a shift around
Ivy Bridge (Edison) that caused them to enable it.

https://www.nersc.gov/users/computational-systems/edison/performance-and-optimization/hyper-threading/

I know two of the authors of that 2002 paper on HT. Will ask them for
insight next time we cross paths.

Jeff

>
> > This matches our observations on our cluster — there was no
> statistically-significant change in performance between having HT turned
> off in the BIOS and turning the second hardware thread of each core off in
> Linux. We run a mix of architectures — Sandy, Ivy, Haswell, and Broadwell
> (all dual-socket Xeon E5s), and KNL, and this appears to hold true across
> of these.
>
> These are very complex architectures; the impacts of enabling/disabling HT
> are going to be highly specific to both the platform and application.
>
> > Moreover, having the second hardware thread turned on in Linux but not
> used by batch jobs (by cgroup-ing them to just one hardware thread of each
> core) substantially reduced the performance impact and jitter from the OS —
> by ~10% in at least one synchronisation-heavy application. This is likely
> because the kernel began scheduling OS tasks (Lustre, IB, IPoIB, IRQs,
> Ganglia, PBS, …) on the second, unused hardware thread of each core, which
> were then run when the batch job’s processes stalled the CPU’s execution
> units. This is with both a CentOS 6.x kernel and a custom (tickless) 7.2
> kernel.
>
> Yes, that's a pretty clever use of HT in an HPC environment.  But be aware
> that you are cutting on-core pipeline depths that can be used by
> applications to do this.  In your setup, it sounds like this is still a net
> performance win (which is pretty sweet).  But that may not be a universal
> effect.
>
> This is probably a +3 on the existing trend from the prior emails in this
> thread: "As always, experiment to find the best for your hardware and
> jobs."  ;-)
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Performance degradation of OpenMPI 1.10.2 when oversubscribed?

Reply via email to