Hi,

Am 24.03.2017 um 20:39 schrieb Jeff Squyres (jsquyres):

> Limiting MPI processes to hyperthreads *helps*, but current generation Intel 
> hyperthreads are not as powerful as cores (they have roughly half the 
> resources of a core), so -- depending on your application and your exact 
> system setup -- you will almost certainly see performance degradation of 
> running N MPI processes across N cores vs. across N hyper threads.  You can 
> try it yourself by running the same size application over N cores on a single 
> machine, and then run the same application over N hyper threads (i.e., N/2 
> cores) on the same machine.
> 
> […]
> 
> - Disabling HT in the BIOS means that the one hardware thread left in each 
> core will get all the cores resources (buffers, queues, processor units, 
> etc.).
> - Enabling HT in the BIOS means that each of the 2 hardware threads will 
> statically be allocated roughly half the core's resources (buffers, queues, 
> processor units, etc.).

Do you have a reference for the two topics above (sure, I will try next week on 
my own)? My knowledge was, that there is no dedicated HT core, and using all 
cores will not give the result that the real cores get N x 100%, plus the HT 
ones N x 50% (or alike). But the scheduler inside the CPU will balance the 
resources between the double face of a single core and both are equal.


> […]
> Spoiler alert: many people have looked at this.  In *most* (but not all) 
> cases, using HT is not a performance win for MPI/HPC codes that are designed 
> to run processors at 100%.

I think it was also on this mailing list, that someone mentioned that the 
pipelines in the CPU are reorganized in case you switch HT off, as only half of 
them would be needed and these resources are then bound to the real cores too, 
extending their performance. Similar, but not exactly what Jeff mentiones above.

Another aspect is, that even if they are not really doubling the performance, 
one might get 150%. And if you pay per CPU hours, it can be worth to have it 
switched on.

My personal experience is, that it depends not only application, but also on 
the way how you oversubscribe. Using all cores for a single MPI application 
leads to the effect, that all processes are doing the same stuff at the same 
time (at least often) and fight for the same part of the CPU, essentially 
becoming a bottleneck. But using each half of a CPU for two (or even more) 
applications will allow a better interleaving in the demand for resources. To 
allow this in the best way: no taskset or binding to cores, let the Linux 
kernel and CPU do their best - YMMV.

-- Reuti
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to