Pengxuan Zheng <quic_pzh...@quicinc.com> writes: > This is similar to the recent improvements to the Advanced SIMD popcount > expansion by using SVE. We can utilize SVE to generate more efficient code for > scalar mode popcount too. > > PR target/113860 > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md (popcount<mode>2): Update pattern to > also support V1DI mode. > * config/aarch64/aarch64.md (popcount<mode>2): Add TARGET_SVE support. > * config/aarch64/iterators.md (VDQHSD_V1DI): New mode iterator. > (SVE_VDQ_I): Add V1DI. > (bitsize): Likewise. > (VPRED): Likewise. > (VEC_POP_MODE): New mode attribute. > (vec_pop_mode): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/popcnt11.c: New test.
Sorry for the slow review of this. The main reason for putting it off was the use of V1DI, which always makes me nervous. In particular: > @@ -2284,7 +2286,7 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI > "VNx8BI") > (VNx8DI "VNx2BI") (VNx8DF "VNx2BI") > (V8QI "VNx8BI") (V16QI "VNx16BI") > (V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI") > - (V4SI "VNx4BI") (V2DI "VNx2BI")]) > + (V4SI "VNx4BI") (V2DI "VNx2BI") (V1DI "VNx2BI")]) > it seems odd to have a predicate mode that contains more elements than the associated single-vector data mode. The patch also extends the non-SVE SIMD popcount pattern for V1DI, but it doesn't look like that path works. E.g. try the following with -march=armv8-a -fgimple -O2: __Uint64x1_t __GIMPLE foo (__Uint64x1_t x) { __Uint64x1_t z; z = .POPCOUNT (x); return z; } Thanks, Richard > ;; ...and again in lower case. > (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi") > @@ -2318,6 +2320,14 @@ (define_mode_attr VDOUBLE [(VNx16QI "VNx32QI") > (VNx4SI "VNx8SI") (VNx4SF "VNx8SF") > (VNx2DI "VNx4DI") (VNx2DF "VNx4DF")]) > > +;; The Advanced SIMD modes of popcount corresponding to scalar modes. > +(define_mode_attr VEC_POP_MODE [(QI "V8QI") (HI "V4HI") > + (SI "V2SI") (DI "V1DI")]) > + > +;; ...and again in lower case. > +(define_mode_attr vec_pop_mode [(QI "v8qi") (HI "v4hi") > + (SI "v2si") (DI "v1di")]) > + > ;; On AArch64 the By element instruction doesn't have a 2S variant. > ;; However because the instruction always selects a pair of values > ;; The normal 3SAME instruction can be used here instead. > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt11.c > b/gcc/testsuite/gcc.target/aarch64/popcnt11.c > new file mode 100644 > index 00000000000..595b2f9eb93 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt11.c > @@ -0,0 +1,58 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -march=armv8.2-a+sve" } */ > +/* { dg-final { check-function-bodies "**" "" "" } } */ > + > +/* > +** f_qi: > +** ldr b([0-9]+), \[x0\] > +** cnt v\1.8b, v\1.8b > +** smov w0, v\1.b\[0\] > +** ret > +*/ > +unsigned > +f_qi (unsigned char *a) > +{ > + return __builtin_popcountg (a[0]); > +} > + > +/* > +** f_hi: > +** ldr h([0-9]+), \[x0\] > +** ptrue (p[0-7]).b, all > +** cnt z\1.h, \2/m, z\1.h > +** smov w0, v\1.h\[0\] > +** ret > +*/ > +unsigned > +f_hi (unsigned short *a) > +{ > + return __builtin_popcountg (a[0]); > +} > + > +/* > +** f_si: > +** ldr s([0-9]+), \[x0\] > +** ptrue (p[0-7]).b, all > +** cnt z\1.s, \2/m, z\1.s > +** umov x0, v\1.d\[0\] > +** ret > +*/ > +unsigned > +f_si (unsigned int *a) > +{ > + return __builtin_popcountg (a[0]); > +} > + > +/* > +** f_di: > +** ldr d([0-9]+), \[x0\] > +** ptrue (p[0-7])\.b, all > +** cnt z\1\.d, \2/m, z\1\.d > +** fmov x0, d\1 > +** ret > +*/ > +unsigned > +f_di (unsigned long *a) > +{ > + return __builtin_popcountg (a[0]); > +}