On 3/26/2014 6:45 AM, Andreas Schäfer wrote:
On 10:27 Wed 26 Mar     , Jeff Squyres (jsquyres) wrote:
Be aware of a few facts, though:

1. There is a fundamental difference between disabling
hyperthreading in the BIOS at power-on time and simply running one
MPI process per core.  Disabling HT at power-on allocates more
hardware resources to the remaining HT that is left is each core
(e.g., deeper queues).
Oh, I didn't know that. That's interesting! Do you have any links with
in-depth info on that?


On certain Intel CPUs, the full size instruction TLB was available to a process when HyperThreading was disabled on the BIOS setup menu, and that was the only way to make all the Write Combine buffers available to a single process. Those CPUs are no longer in widespread use.

At one time, at Intel, we did a study to evaluate the net effect (on a later CPU where this did not recover ITLB size). The result was buried afterwards; possibly it didn't meet an unspecified marketing goal. Typical applications ran 1% faster with HyperThreading disabled by BIOS menu even with affinities carefully set to use just one process per core. Not all applications showed a loss on all data sets when leaving HT enabled. There are a few MPI applications with specialized threading which could gain 10% or more by use of HT.

In my personal opinion, SMT becomes less interesting as the number of independent cores increases. Intel(r) Xeon Phi(tm) is an exception, as the vector processing unit issues instructions from a single thread only on alternate cycles. This capability is used more effectively by running OpenMP threads under MPI, e.g. 6 ranks per coprocessor of 30 threads each, spread across 10 cores per rank (exact optimum depending on the application; MKL libraries use all available hardware threads for sufficiently large data sets).

--
Tim Prince

Reply via email to