Pengxuan Zheng <quic_pzh...@quicinc.com> writes:
> This is similar to the recent improvements to the Advanced SIMD popcount
> expansion by using SVE. We can utilize SVE to generate more efficient code for
> scalar mode popcount too.
>
> Changes since v1:
> * v2: Add a new VNx1BI mode and a new test case for V1DI.
> * v3: Abandon VNx1BI changes and add a new variant of aarch64_ptrue_reg.

Sorry for the slow review.

The patch looks good though.  OK with the changes below:

> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt12.c 
> b/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> new file mode 100644
> index 00000000000..f086cae55a2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fgimple" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +

It's probably safer to add:

#pragma GCC target "+nosve"

here, so that we don't try to use the SVE instructions.

> +/*
> +** foo:
> +**   cnt     v0.8b, v0.8b
> +**   addv    b0, v0.8b

Nothing requires the temporary register to be v0, so this should be
something like:

        cnt     (v[0-9]+\.8b), v0\.8b
        addv    b0, \1

Thanks,
Richard

> +**   ret
> +*/
> +__Uint64x1_t __GIMPLE
> +foo (__Uint64x1_t x)
> +{
> +  __Uint64x1_t z;
> +
> +  z = .POPCOUNT (x);
> +  return z;
> +}

Reply via email to