Attached v10 with minor changes based on the feedback. > Thanks, I think this is a good direction. Some comments: > > #if defined(HAVE__GET_CPUID) > __get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]); #elif defined(HAVE__CPUID) > __cpuid(exx, 1); #endif
Oh yeah good catch. Fixed in v10. > + } > + /* avx512 os support */ > + if (zmm_regs_available()) { > + return; > + } > > // BTW, I do like the gating here that reduces the number of places that have > to > know about zmm and xsave. (Side note: shouldn't that be "if > !(zmm_regs_available())"?) Yup, good catch again 😊 > What do you think? Yup, those look correct to me. Fixed them in v10. > + PG_CPU_FEATURE_PCLMUL = 3, > [...] > + PG_CPU_FEATURE_ARMV8_CRC32C = 100, > > I'm not a fan of exposing these architecture-specific details to places that > consult > the capabilities. That requires things like > > +#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42) > [...] > +#define PGCPUCAP_CRC32C > pg_cpu_have(PG_CPU_FEATURE_ARMV8_CRC32C) > > I'd prefer to have 1 capability <-> one place to check at runtime for > architectures > that need that, and to keep architecture details private to the > initialization step. . > Even for things that test for which function pointer to use, I think it's a > cleaner > interface to look at one thing. Isn't that one thing currently pg_cpu_have(FEATURE)? The reason we have the additional PGCPUCAP_CRC32C is because the dispatch for CRC32C is currently defined in the header file common to both ARM and SSE42. We should ideally have separate dispatch for ARM and x86 in their own files (the way it is in the main branch). This also makes it easier to add more implementations in the future without having to make the dispatch function work for both ARM and x86. > If we're going to have a single file for the init step, we don't need this -- > we'd just > have a different definition of > pg_cpucap_initialize() in each part, with a default that only adds the "init" > slot: > > #if defined( __i386__ ) || defined(i386) || defined(_M_IX86) || > defined(__x86_64__) || defined(__x86_64) || defined(_M_AMD64) > > <cpuid checks> > > #elif defined(__arm__) || defined(__arm) || > defined(__aarch64__) || defined(_M_ARM64) > > <for now only pg_crc32c_armv8_available()> > > #else > <only init> > #endif Makes sense. Got rid of it in v10. Raghuveer
v10-0001-Dispatch-CRC-computation-by-branching-rather-tha.patch
Description: v10-0001-Dispatch-CRC-computation-by-branching-rather-tha.patch
v10-0002-Rename-CRC-choose-files-to-cpucap-files.patch
Description: v10-0002-Rename-CRC-choose-files-to-cpucap-files.patch
v10-0003-Add-a-Postgres-SQL-function-for-crc32c-benchmark.patch
Description: v10-0003-Add-a-Postgres-SQL-function-for-crc32c-benchmark.patch
v10-0004-Improve-CRC32C-performance-on-x86_64.patch
Description: v10-0004-Improve-CRC32C-performance-on-x86_64.patch
v10-0005-Move-all-cpuid-checks-to-one-location.patch
Description: v10-0005-Move-all-cpuid-checks-to-one-location.patch