On Sat, Mar 25, 2017 at 7:15 AM Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:
> On Mar 25, 2017, at 3:04 AM, Ben Menadue <ben.mena...@nci.org.au> wrote: > > > > I’m not sure about this. It was my understanding that HyperThreading is > implemented as a second set of e.g. registers that share execution units. > There’s no division of the resources between the hardware threads, but > rather the execution units switch between the two threads as they stall > (e.g. cache miss, hazard/dependency, misprediction, …) — kind of like a > context switch, but much cheaper. As long as there’s nothing being > scheduled on the other hardware thread, there’s no impact on the > performance. Moreover, turning HT off in the BIOS doesn’t make more > resources available to now-single hardware thread. > > Here's an old post on this list where I cited a paper from the Intel > Technology Journal. The paper is pretty old at this point (2002, I > believe?), but I believe it was published near the beginning of the HT > technology at Intel: > > > https://www.mail-archive.com/hwloc-users@lists.open-mpi.org/msg01135.html > > The paper is attached on that post; see, in particular, the section > "Single-task and multi-task modes". > > All this being said, I'm a software wonk with a decent understanding of > hardware. But I don't closely follow all the specific details of all > hardware. So if Haswell / Broadwell / Skylake processors, for example, are > substantially different than the HT architecture described in that paper, > please feel free to correct me! > I don't know the details, but HPC centers like NERSC noticed a shift around Ivy Bridge (Edison) that caused them to enable it. https://www.nersc.gov/users/computational-systems/edison/performance-and-optimization/hyper-threading/ I know two of the authors of that 2002 paper on HT. Will ask them for insight next time we cross paths. Jeff > > > This matches our observations on our cluster — there was no > statistically-significant change in performance between having HT turned > off in the BIOS and turning the second hardware thread of each core off in > Linux. We run a mix of architectures — Sandy, Ivy, Haswell, and Broadwell > (all dual-socket Xeon E5s), and KNL, and this appears to hold true across > of these. > > These are very complex architectures; the impacts of enabling/disabling HT > are going to be highly specific to both the platform and application. > > > Moreover, having the second hardware thread turned on in Linux but not > used by batch jobs (by cgroup-ing them to just one hardware thread of each > core) substantially reduced the performance impact and jitter from the OS — > by ~10% in at least one synchronisation-heavy application. This is likely > because the kernel began scheduling OS tasks (Lustre, IB, IPoIB, IRQs, > Ganglia, PBS, …) on the second, unused hardware thread of each core, which > were then run when the batch job’s processes stalled the CPU’s execution > units. This is with both a CentOS 6.x kernel and a custom (tickless) 7.2 > kernel. > > Yes, that's a pretty clever use of HT in an HPC environment. But be aware > that you are cutting on-core pipeline depths that can be used by > applications to do this. In your setup, it sounds like this is still a net > performance win (which is pretty sweet). But that may not be a universal > effect. > > This is probably a +3 on the existing trend from the prior emails in this > thread: "As always, experiment to find the best for your hardware and > jobs." ;-) > > -- > Jeff Squyres > jsquy...@cisco.com > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users