On Mon, 29 Apr 2024, Daniel P. Berrangé wrote: > On Wed, Apr 24, 2024 at 03:56:57PM -0700, Richard Henderson wrote: > > From: Alexander Monakov <[email protected]> > > > > Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD > > routines are invoked much more rarely in normal use when most buffers > > are non-zero. This makes use of AVX512 unprofitable, as it incurs extra > > frequency and voltage transition periods during which the CPU operates > > at reduced performance, as described in > > https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html > > This is describing limitations of Intel's AVX512 implementation. > > AMD's AVX512 implementation is said to not have the kind of > power / frequency limitations that Intel's does: > > https://www.mersenneforum.org/showthread.php?p=614191 > > "Overall, AMD's AVX512 implementation beat my expectations. > I was expecting something similar to Zen1's "double-pumping" > of AVX with half the register file and cross-lane instructions > being super slow. But this is not the case on Zen4. The lack > of power or thermal issues combined with stellar shuffle support > makes it completely worthwhile to use from a developer standpoint. > If your code can vectorize without excessive wasted computation, > then go all the way to 512-bit. AMD not only made this worthwhile, > but *incentivizes* it with the power savings. And if in the future > AMD decides to widen things up, you may get a 2x speedup for free." > > IOW, it sounds like we could be sacrificing performance on modern > AMD Genoa generation CPUs by removing the AVX512 impl
No, the new implementation saturates load ports, and Genoa runs 512-bit AVX instructions at half throughput compared to their 256-bit counterparts (so one 512-bit load or two 256-bit loads per cycle), so there's no obvious reason why this patch would sacrifice performance there. Maybe it could, indirectly, by lowering the turbo clock limit due to higher front-end activity, but I don't have access to a Zen 4 machine to check, and even so it would be a few percent, not 2x. Alexander
