> The meson configure check seems to fail on my machine
> This test looks quite different than the autoconf one.  Why is that?  I
would expect them to be the same.  And I think ideally the test would check
that all the intrinsics functions we need are available.

Fixed, both meson and autoconf have the same test program with all the 
intrinsics.
Meson should work now.

> +    pgac_save_CFLAGS=$CFLAGS
> +    CFLAGS="$pgac_save_CFLAGS $1"

> I don't see any extra compiler flag tests used, so we no longer need this,
right?

True, removed it.

> +  if test x"$pgac_cv_arm_sve_popcnt_intrinsics" = x"yes"; then
> +    pgac_arm_sve_popcnt_intrinsics=yes
> +  fi

> I'm curious why this doesn't use Ac_cachevar like the examples above it
(e.g., PGAC_XSAVE_INTRINSICS).

Implemented using Ac_cachevar similar to  PGAC_XSAVE_INTRINSICS.

> +#include "c.h"
> +#include "port/pg_bitutils.h"
> +
> +#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK

> nitpick: The USE_SVE_POPCNT_WITH_RUNTIME_CHECK check can probably go above
the #include for pg_bitutils.h (but below the one for c.h).

Done.

> pg_crc32c_armv8_available() (in pg_crc32c_armv8_choose.c) looks quite a bit
more complicated than this.  Are we missing something here?

SVE is only available in aarch64, so we don't need to worry about aarch32. The 
latest patch
includes runtime checks for Linux and FreeBSD. For all other operating systems, 
false is
returned, because we are unable to verify the check.

> +    /*
> +     * For smaller inputs, aligning the buffer degrades the performance.
> +     * Therefore, the buffers only when the input size is sufficiently large.
> +     */

> Is the inverse true, i.e., does aligning the buffer improve performance for
> larger inputs?  I'm also curious what level of performance degradation you
> were seeing.

Here is a comparison of all three cases. Alignment is marginally better for 
inputs
above 1024B, but the difference is small. Unaligned performs better for smaller 
inputs.
Aligned After 128B => the current implementation "if (aligned != buf && bytes > 
4 * vec_len)"
Always Aligned => condition "bytes > 4 * vec_len" is removed.
Unaligned => the whole if block was removed

 buf    | Always Aligned | Aligned After 128B | Unaligned
--------+---------------+--------------------+------------
   16   |       37.851  |           38.203   |     34.971
   32   |       37.859  |           38.187   |     34.972
   64   |       37.611  |           37.405   |     34.121
  128   |       45.357  |           45.897   |     41.890
  256   |       62.440  |           63.454   |     58.666
  512   |      100.120  |          102.767   |     99.861
 1024   |      159.574  |          158.594   |    164.975
 2048   |      282.354  |          281.198   |    283.937
 4096   |      532.038  |          531.068   |    533.699
 8192   |     1038.973  |         1038.083   |   1039.206
16384   |     2028.604  |         2025.843   |   2033.940

---
Chiranmoy

Attachment: v4-0001-SVE-support-for-popcount-and-popcount-masked.patch
Description: v4-0001-SVE-support-for-popcount-and-popcount-masked.patch

Reply via email to