The meson configure check seems to fail on my machine: error: too many arguments to function call, expected 0, have 1 10 | svuint64_t popcnt = svcntb(val); | ~~~~~~ ^~~
error: returning '__SVInt64_t' from a function with incompatible result type 'int' 12 | return popcnt == 0; | ^~~~~~~~~~~ The autoconf version seems to work okay, though. + pgac_save_CFLAGS=$CFLAGS + CFLAGS="$pgac_save_CFLAGS $1" I don't see any extra compiler flag tests used, so we no longer need this, right? + if test x"$pgac_cv_arm_sve_popcnt_intrinsics" = x"yes"; then + pgac_arm_sve_popcnt_intrinsics=yes + fi I'm curious why this doesn't use Ac_cachevar like the examples above it (e.g., PGAC_XSAVE_INTRINSICS). + prog = ''' +#include <arm_sve.h> + +#if defined(__has_attribute) && __has_attribute (target) + __attribute__((target("arch=armv8-a+sve"))) +#endif +int main(void) +{ + const svuint64_t val = svdup_u64(0xFFFFFFFFFFFFFFFF); + svuint64_t popcnt = svcntb(val); + /* return computed value, to prevent the above being optimized away */ + return popcnt == 0; +} +''' This test looks quite different than the autoconf one. Why is that? I would expect them to be the same. And I think ideally the test would check that all the intrinsics functions we need are available. +/* + * Returns true if the CPU supports the instructions required for the SVE + * pg_popcount() implementation. + */ +bool +pg_popcount_sve_available(void) +{ + return getauxval(AT_HWCAP) & HWCAP_SVE; +} pg_crc32c_armv8_available() (in pg_crc32c_armv8_choose.c) looks quite a bit more complicated than this. Are we missing something here? + /* + * For smaller inputs, aligning the buffer degrades the performance. + * Therefore, the buffers only when the input size is sufficiently large. + */ Is the inverse true, i.e., does aligning the buffer improve performance for larger inputs? I'm also curious what level of performance degradation you were seeing. +#include "c.h" +#include "port/pg_bitutils.h" + +#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK nitpick: The USE_SVE_POPCNT_WITH_RUNTIME_CHECK check can probably go above the #include for pg_bitutils.h (but below the one for c.h). -- nathan