On Thu, Nov 23, 2023 at 1:49 AM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Wed, Nov 22, 2023 at 02:54:13PM +0200, Ants Aasma wrote: > > For reference, executing the page checksum 10M times on a AMD 3900X CPU: > > > > clang-14 -O2 4.292s (17.8 GiB/s) > > clang-14 -O2 -msse4.1 2.859s (26.7 GiB/s) > > clang-14 -O2 -msse4.1 -mavx2 1.378s (55.4 GiB/s) > > Nice. I've noticed similar improvements with AVX2 intrinsics in simd.h.
If you're thinking to support AVX2 anywhere, I'd start with checksum first. Much less code to review, and less risk.