> The meson configure check seems to fail on my machine > This test looks quite different than the autoconf one. Why is that? I would expect them to be the same. And I think ideally the test would check that all the intrinsics functions we need are available.
Fixed, both meson and autoconf have the same test program with all the intrinsics. Meson should work now. > + pgac_save_CFLAGS=$CFLAGS > + CFLAGS="$pgac_save_CFLAGS $1" > I don't see any extra compiler flag tests used, so we no longer need this, right? True, removed it. > + if test x"$pgac_cv_arm_sve_popcnt_intrinsics" = x"yes"; then > + pgac_arm_sve_popcnt_intrinsics=yes > + fi > I'm curious why this doesn't use Ac_cachevar like the examples above it (e.g., PGAC_XSAVE_INTRINSICS). Implemented using Ac_cachevar similar to PGAC_XSAVE_INTRINSICS. > +#include "c.h" > +#include "port/pg_bitutils.h" > + > +#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK > nitpick: The USE_SVE_POPCNT_WITH_RUNTIME_CHECK check can probably go above the #include for pg_bitutils.h (but below the one for c.h). Done. > pg_crc32c_armv8_available() (in pg_crc32c_armv8_choose.c) looks quite a bit more complicated than this. Are we missing something here? SVE is only available in aarch64, so we don't need to worry about aarch32. The latest patch includes runtime checks for Linux and FreeBSD. For all other operating systems, false is returned, because we are unable to verify the check. > + /* > + * For smaller inputs, aligning the buffer degrades the performance. > + * Therefore, the buffers only when the input size is sufficiently large. > + */ > Is the inverse true, i.e., does aligning the buffer improve performance for > larger inputs? I'm also curious what level of performance degradation you > were seeing. Here is a comparison of all three cases. Alignment is marginally better for inputs above 1024B, but the difference is small. Unaligned performs better for smaller inputs. Aligned After 128B => the current implementation "if (aligned != buf && bytes > 4 * vec_len)" Always Aligned => condition "bytes > 4 * vec_len" is removed. Unaligned => the whole if block was removed buf | Always Aligned | Aligned After 128B | Unaligned --------+---------------+--------------------+------------ 16 | 37.851 | 38.203 | 34.971 32 | 37.859 | 38.187 | 34.972 64 | 37.611 | 37.405 | 34.121 128 | 45.357 | 45.897 | 41.890 256 | 62.440 | 63.454 | 58.666 512 | 100.120 | 102.767 | 99.861 1024 | 159.574 | 158.594 | 164.975 2048 | 282.354 | 281.198 | 283.937 4096 | 532.038 | 531.068 | 533.699 8192 | 1038.973 | 1038.083 | 1039.206 16384 | 2028.604 | 2025.843 | 2033.940 --- Chiranmoy
v4-0001-SVE-support-for-popcount-and-popcount-masked.patch
Description: v4-0001-SVE-support-for-popcount-and-popcount-masked.patch