Re: [PATCH] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860]

Richard Sandiford Thu, 19 Sep 2024 00:54:35 -0700

Pengxuan Zheng <quic_pzh...@quicinc.com> writes:
> This is similar to the recent improvements to the Advanced SIMD popcount
> expansion by using SVE. We can utilize SVE to generate more efficient code for
> scalar mode popcount too.
>
>       PR target/113860
>
> gcc/ChangeLog:
>
>       * config/aarch64/aarch64-simd.md (popcount<mode>2): Update pattern to
>       also support V1DI mode.
>       * config/aarch64/aarch64.md (popcount<mode>2): Add TARGET_SVE support.
>       * config/aarch64/iterators.md (VDQHSD_V1DI): New mode iterator.
>       (SVE_VDQ_I): Add V1DI.
>       (bitsize): Likewise.
>       (VPRED): Likewise.
>       (VEC_POP_MODE): New mode attribute.
>       (vec_pop_mode): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.target/aarch64/popcnt11.c: New test.


Sorry for the slow review of this.  The main reason for putting it off
was the use of V1DI, which always makes me nervous.

In particular:

> @@ -2284,7 +2286,7 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI 
> "VNx8BI")
>                        (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")
>                        (V8QI "VNx8BI") (V16QI "VNx16BI")
>                        (V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI")
> -                      (V4SI "VNx4BI") (V2DI "VNx2BI")])
> +                      (V4SI "VNx4BI") (V2DI "VNx2BI") (V1DI "VNx2BI")])
>  

it seems odd to have a predicate mode that contains more elements than
the associated single-vector data mode.

The patch also extends the non-SVE SIMD popcount pattern for V1DI,
but it doesn't look like that path works.  E.g. try the following
with -march=armv8-a -fgimple -O2:

__Uint64x1_t __GIMPLE
foo (__Uint64x1_t x)
{
  __Uint64x1_t z;

  z = .POPCOUNT (x);
  return z;
}

Thanks,
Richard


>  ;; ...and again in lower case.
>  (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi")
> @@ -2318,6 +2320,14 @@ (define_mode_attr VDOUBLE [(VNx16QI "VNx32QI")
>                          (VNx4SI "VNx8SI") (VNx4SF "VNx8SF")
>                          (VNx2DI "VNx4DI") (VNx2DF "VNx4DF")])
>  
> +;; The Advanced SIMD modes of popcount corresponding to scalar modes.
> +(define_mode_attr VEC_POP_MODE [(QI "V8QI") (HI "V4HI")
> +                             (SI "V2SI") (DI "V1DI")])
> +
> +;; ...and again in lower case.
> +(define_mode_attr vec_pop_mode [(QI "v8qi") (HI "v4hi")
> +                             (SI "v2si") (DI "v1di")])
> +
>  ;; On AArch64 the By element instruction doesn't have a 2S variant.
>  ;; However because the instruction always selects a pair of values
>  ;; The normal 3SAME instruction can be used here instead.
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt11.c 
> b/gcc/testsuite/gcc.target/aarch64/popcnt11.c
> new file mode 100644
> index 00000000000..595b2f9eb93
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt11.c
> @@ -0,0 +1,58 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8.2-a+sve" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +/*
> +** f_qi:
> +**   ldr     b([0-9]+), \[x0\]
> +**   cnt     v\1.8b, v\1.8b
> +**   smov    w0, v\1.b\[0\]
> +**   ret
> +*/
> +unsigned
> +f_qi (unsigned char *a)
> +{
> +  return __builtin_popcountg (a[0]);
> +}
> +
> +/*
> +** f_hi:
> +**   ldr     h([0-9]+), \[x0\]
> +**   ptrue   (p[0-7]).b, all
> +**   cnt     z\1.h, \2/m, z\1.h
> +**   smov    w0, v\1.h\[0\]
> +**   ret
> +*/
> +unsigned
> +f_hi (unsigned short *a)
> +{
> +  return __builtin_popcountg (a[0]);
> +}
> +
> +/*
> +** f_si:
> +**   ldr     s([0-9]+), \[x0\]
> +**   ptrue   (p[0-7]).b, all
> +**   cnt     z\1.s, \2/m, z\1.s
> +**   umov    x0, v\1.d\[0\]
> +**   ret
> +*/
> +unsigned
> +f_si (unsigned int *a)
> +{
> +  return __builtin_popcountg (a[0]);
> +}
> +
> +/*
> +** f_di:
> +**   ldr     d([0-9]+), \[x0\]
> +**   ptrue   (p[0-7])\.b, all
> +**   cnt     z\1\.d, \2/m, z\1\.d
> +**   fmov    x0, d\1
> +**   ret
> +*/
> +unsigned
> +f_di (unsigned long *a)
> +{
> +  return __builtin_popcountg (a[0]);
> +}

Re: [PATCH] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860]

Reply via email to