Attached v10 with minor changes based on the feedback. 

> Thanks, I think this is a good direction. Some comments:
> 
> #if defined(HAVE__GET_CPUID)
> __get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]); #elif defined(HAVE__CPUID)
> __cpuid(exx, 1); #endif

Oh yeah good catch. Fixed in v10. 

> +    }
> +    /* avx512 os support */
> +    if (zmm_regs_available()) {
> +        return;
> +    }
> 
> // BTW, I do like the gating here that reduces the number of places that have 
> to
> know about zmm and xsave. (Side note: shouldn't that be "if
> !(zmm_regs_available())"?)

Yup, good catch again 😊 

> What do you think?

Yup, those look correct to me. Fixed them in v10. 

> +    PG_CPU_FEATURE_PCLMUL            = 3,
> [...]
> +    PG_CPU_FEATURE_ARMV8_CRC32C      = 100,
> 
> I'm not a fan of exposing these architecture-specific details to places that 
> consult
> the capabilities. That requires things like
> 
> +#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42)
> [...]
> +#define PGCPUCAP_CRC32C
> pg_cpu_have(PG_CPU_FEATURE_ARMV8_CRC32C)
> 
> I'd prefer to have 1 capability <-> one place to check at runtime for 
> architectures
> that need that, and to keep architecture details private to the 
> initialization step. .
> Even for things that test for which function pointer to use, I think it's a 
> cleaner
> interface to look at one thing.

Isn't that one thing currently pg_cpu_have(FEATURE)? The reason we have the 
additional PGCPUCAP_CRC32C is because the dispatch for CRC32C is currently 
defined in the header file common to both ARM and SSE42. We should ideally have 
separate dispatch for ARM and x86 in their own files (the way it is in the main 
branch). This also makes it easier to add more implementations in the future 
without having to make the dispatch function work for both ARM and x86. 

> If we're going to have a single file for the init step, we don't need this -- 
> we'd just
> have a different definition of
> pg_cpucap_initialize() in each part, with a default that only adds the "init" 
> slot:
> 
> #if defined( __i386__ ) || defined(i386) || defined(_M_IX86) ||
>   defined(__x86_64__) || defined(__x86_64) || defined(_M_AMD64)
> 
> <cpuid checks>
> 
> #elif defined(__arm__) || defined(__arm) ||
>    defined(__aarch64__) || defined(_M_ARM64)
> 
> <for now only pg_crc32c_armv8_available()>
> 
> #else
> <only init>
> #endif

Makes sense. Got rid of it in v10. 

Raghuveer 

Attachment: v10-0001-Dispatch-CRC-computation-by-branching-rather-tha.patch
Description: v10-0001-Dispatch-CRC-computation-by-branching-rather-tha.patch

Attachment: v10-0002-Rename-CRC-choose-files-to-cpucap-files.patch
Description: v10-0002-Rename-CRC-choose-files-to-cpucap-files.patch

Attachment: v10-0003-Add-a-Postgres-SQL-function-for-crc32c-benchmark.patch
Description: v10-0003-Add-a-Postgres-SQL-function-for-crc32c-benchmark.patch

Attachment: v10-0004-Improve-CRC32C-performance-on-x86_64.patch
Description: v10-0004-Improve-CRC32C-performance-on-x86_64.patch

Attachment: v10-0005-Move-all-cpuid-checks-to-one-location.patch
Description: v10-0005-Move-all-cpuid-checks-to-one-location.patch

Reply via email to