> Pengxuan Zheng <quic_pzh...@quicinc.com> writes:
> > This is similar to the recent improvements to the Advanced SIMD
> > popcount expansion by using SVE. We can utilize SVE to generate more
> > efficient code for scalar mode popcount too.
> >
> >     PR target/113860
> >
> > gcc/ChangeLog:
> >
> >     * config/aarch64/aarch64-simd.md (popcount<mode>2): Update
> pattern to
> >     also support V1DI mode.
> >     * config/aarch64/aarch64.md (popcount<mode>2): Add TARGET_SVE
> support.
> >     * config/aarch64/iterators.md (VDQHSD_V1DI): New mode iterator.
> >     (SVE_VDQ_I): Add V1DI.
> >     (bitsize): Likewise.
> >     (VPRED): Likewise.
> >     (VEC_POP_MODE): New mode attribute.
> >     (vec_pop_mode): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >     * gcc.target/aarch64/popcnt11.c: New test.
> 
> Sorry for the slow review of this.  The main reason for putting it off was the
> use of V1DI, which always makes me nervous.
> 
> In particular:
> 
> > @@ -2284,7 +2286,7 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI")
> (VNx8QI "VNx8BI")
> >                      (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")
> >                      (V8QI "VNx8BI") (V16QI "VNx16BI")
> >                      (V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI")
> > -                    (V4SI "VNx4BI") (V2DI "VNx2BI")])
> > +                    (V4SI "VNx4BI") (V2DI "VNx2BI") (V1DI "VNx2BI")])
> >
> 
> it seems odd to have a predicate mode that contains more elements than the
> associated single-vector data mode.

I agree. I've added a new predicate mode VNx1BI in v2 of the patch. Please let 
me know if I missed anything.

> 
> The patch also extends the non-SVE SIMD popcount pattern for V1DI, but it
> doesn't look like that path works.  E.g. try the following with -march=armv8-a
> -fgimple -O2:
> 
> __Uint64x1_t __GIMPLE
> foo (__Uint64x1_t x)
> {
>   __Uint64x1_t z;
> 
>   z = .POPCOUNT (x);
>   return z;
> }

Good catch! Thanks, Richard. I've fixed this issue and added the example as a 
test case in v2.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663916.html

Thanks,
Pengxuan
> 
> Thanks,
> Richard
> 
> 
> >  ;; ...and again in lower case.
> >  (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi") @@
> > -2318,6 +2320,14 @@ (define_mode_attr VDOUBLE [(VNx16QI "VNx32QI")
> >                        (VNx4SI "VNx8SI") (VNx4SF "VNx8SF")
> >                        (VNx2DI "VNx4DI") (VNx2DF "VNx4DF")])
> >
> > +;; The Advanced SIMD modes of popcount corresponding to scalar modes.
> > +(define_mode_attr VEC_POP_MODE [(QI "V8QI") (HI "V4HI")
> > +                           (SI "V2SI") (DI "V1DI")])
> > +
> > +;; ...and again in lower case.
> > +(define_mode_attr vec_pop_mode [(QI "v8qi") (HI "v4hi")
> > +                           (SI "v2si") (DI "v1di")])
> > +
> >  ;; On AArch64 the By element instruction doesn't have a 2S variant.
> >  ;; However because the instruction always selects a pair of values
> > ;; The normal 3SAME instruction can be used here instead.
> > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt11.c
> > b/gcc/testsuite/gcc.target/aarch64/popcnt11.c
> > new file mode 100644
> > index 00000000000..595b2f9eb93
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt11.c
> > @@ -0,0 +1,58 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=armv8.2-a+sve" } */
> > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > +
> > +/*
> > +** f_qi:
> > +** ldr     b([0-9]+), \[x0\]
> > +** cnt     v\1.8b, v\1.8b
> > +** smov    w0, v\1.b\[0\]
> > +** ret
> > +*/
> > +unsigned
> > +f_qi (unsigned char *a)
> > +{
> > +  return __builtin_popcountg (a[0]);
> > +}
> > +
> > +/*
> > +** f_hi:
> > +** ldr     h([0-9]+), \[x0\]
> > +** ptrue   (p[0-7]).b, all
> > +** cnt     z\1.h, \2/m, z\1.h
> > +** smov    w0, v\1.h\[0\]
> > +** ret
> > +*/
> > +unsigned
> > +f_hi (unsigned short *a)
> > +{
> > +  return __builtin_popcountg (a[0]);
> > +}
> > +
> > +/*
> > +** f_si:
> > +** ldr     s([0-9]+), \[x0\]
> > +** ptrue   (p[0-7]).b, all
> > +** cnt     z\1.s, \2/m, z\1.s
> > +** umov    x0, v\1.d\[0\]
> > +** ret
> > +*/
> > +unsigned
> > +f_si (unsigned int *a)
> > +{
> > +  return __builtin_popcountg (a[0]);
> > +}
> > +
> > +/*
> > +** f_di:
> > +** ldr     d([0-9]+), \[x0\]
> > +** ptrue   (p[0-7])\.b, all
> > +** cnt     z\1\.d, \2/m, z\1\.d
> > +** fmov    x0, d\1
> > +** ret
> > +*/
> > +unsigned
> > +f_di (unsigned long *a)
> > +{
> > +  return __builtin_popcountg (a[0]);
> > +}

Reply via email to