On Mon, Nov 06, 2023 at 09:53:15PM -0800, Noah Misch wrote: > On Mon, Nov 06, 2023 at 09:59:26PM -0600, Nathan Bossart wrote: >> On Mon, Nov 06, 2023 at 07:15:01PM -0800, Noah Misch wrote: >> > The glibc/gcc "ifunc" mechanism was designed to solve this problem of >> > choosing >> > a function implementation based on the runtime CPU, without incurring >> > function >> > pointer overhead. I would not attempt to use AVX512 on non-glibc systems, >> > and >> > I would use ifunc to select the desired popcount implementation on glibc: >> > https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Function-Attributes.html >> >> Thanks, that seems promising for the function pointer cases. I'll plan on >> trying to convert one of the existing ones to use it. BTW it looks like >> LLVM has something similar [0]. >> >> IIUC this unfortunately wouldn't help for cases where we wanted to keep >> stuff inlined, such as is_valid_ascii() and the functions in pg_lfind.h, >> unless we applied it to the calling functions, but that doesn't ѕound >> particularly maintainable. > > Agreed, it doesn't solve inline cases. If the gains are big enough, we should > move toward packages containing N CPU-specialized copies of the postgres > binary, with bin/postgres just exec'ing the right one.
I performed a quick test with ifunc on my x86 machine that ordinarily uses the runtime checks for the CRC32C code, and I actually see a consistent 3.5% regression for pg_waldump -z on 100M 65-byte records. I've attached the patch used for testing. The multiple-copies-of-the-postgres-binary idea seems interesting. That's probably not something that could be enabled by default, but perhaps we could add support for a build option. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
diff --git a/src/include/port/pg_crc32c.h b/src/include/port/pg_crc32c.h index d085f1dc00..6db411ee29 100644 --- a/src/include/port/pg_crc32c.h +++ b/src/include/port/pg_crc32c.h @@ -78,7 +78,7 @@ extern pg_crc32c pg_comp_crc32c_loongarch(pg_crc32c crc, const void *data, size_ #define FIN_CRC32C(crc) ((crc) ^= 0xFFFFFFFF) extern pg_crc32c pg_comp_crc32c_sb8(pg_crc32c crc, const void *data, size_t len); -extern pg_crc32c (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len); +extern pg_crc32c pg_comp_crc32c(pg_crc32c crc, const void *data, size_t len); #ifdef USE_SSE42_CRC32C_WITH_RUNTIME_CHECK extern pg_crc32c pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len); diff --git a/src/port/pg_crc32c_sse42_choose.c b/src/port/pg_crc32c_sse42_choose.c index 41ff4a35ad..62bb981ee8 100644 --- a/src/port/pg_crc32c_sse42_choose.c +++ b/src/port/pg_crc32c_sse42_choose.c @@ -51,14 +51,14 @@ pg_crc32c_sse42_available(void) * so that subsequent calls are routed directly to the chosen implementation. */ static pg_crc32c -pg_comp_crc32c_choose(pg_crc32c crc, const void *data, size_t len) +(*pg_comp_crc32c_choose (void))(pg_crc32c crc, const void *data, size_t len) { if (pg_crc32c_sse42_available()) - pg_comp_crc32c = pg_comp_crc32c_sse42; + return pg_comp_crc32c_sse42; else - pg_comp_crc32c = pg_comp_crc32c_sb8; - - return pg_comp_crc32c(crc, data, len); + return pg_comp_crc32c_sb8; } -pg_crc32c (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len) = pg_comp_crc32c_choose; +pg_crc32c +pg_comp_crc32c(pg_crc32c crc, const void *data, size_t len) + __attribute__ ((ifunc ("pg_comp_crc32c_choose")));