On Thu, Nov 23, 2023 at 1:49 AM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Wed, Nov 22, 2023 at 02:54:13PM +0200, Ants Aasma wrote: > > On Wed, 22 Nov 2023 at 11:44, John Naylor <johncnaylo...@gmail.com> wrote: > >> Poking in those files a bit, I also see references to building with > >> SSE 4.1. Maybe that's an avenue that we should pursue? (an indirect > >> function call is surely worth it for page-sized data) > > Yes, I think we should, but we also need to be careful not to hurt > performance on platforms that aren't able to benefit [0] [1].
Well, yes (see popcount using a direct function call on non-x86), but I don't think it's as important for page-sized data. Also, sse4.1 is ~10 years old, I think. > There are a couple of other threads about adding support for newer > instructions [2] [3], and properly detecting the availability of these > instructions seems to be a common obstacle. We have a path forward for > stuff that's already using a runtime check (e.g., CRC32C), but I think > we're still trying to figure out what to do for things that must be inlined > (e.g., simd.h). > > One half-formed idea I have is to introduce some sort of ./configure flag > that enables all the newer instructions that your CPU understands. That's not doable, but we should consider taking advantage of x86-64-v2, which RedHat 9 builds with. That would allow inlining CRC and popcount there. Not sure how to detect that easily.