> Pengxuan Zheng <quic_pzh...@quicinc.com> writes: > > This is similar to the recent improvements to the Advanced SIMD > > popcount expansion by using SVE. We can utilize SVE to generate more > > efficient code for scalar mode popcount too. > > > > PR target/113860 > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-simd.md (popcount<mode>2): Update > pattern to > > also support V1DI mode. > > * config/aarch64/aarch64.md (popcount<mode>2): Add TARGET_SVE > support. > > * config/aarch64/iterators.md (VDQHSD_V1DI): New mode iterator. > > (SVE_VDQ_I): Add V1DI. > > (bitsize): Likewise. > > (VPRED): Likewise. > > (VEC_POP_MODE): New mode attribute. > > (vec_pop_mode): Likewise. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/popcnt11.c: New test. > > Sorry for the slow review of this. The main reason for putting it off was the > use of V1DI, which always makes me nervous. > > In particular: > > > @@ -2284,7 +2286,7 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") > (VNx8QI "VNx8BI") > > (VNx8DI "VNx2BI") (VNx8DF "VNx2BI") > > (V8QI "VNx8BI") (V16QI "VNx16BI") > > (V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI") > > - (V4SI "VNx4BI") (V2DI "VNx2BI")]) > > + (V4SI "VNx4BI") (V2DI "VNx2BI") (V1DI "VNx2BI")]) > > > > it seems odd to have a predicate mode that contains more elements than the > associated single-vector data mode.
I agree. I've added a new predicate mode VNx1BI in v2 of the patch. Please let me know if I missed anything. > > The patch also extends the non-SVE SIMD popcount pattern for V1DI, but it > doesn't look like that path works. E.g. try the following with -march=armv8-a > -fgimple -O2: > > __Uint64x1_t __GIMPLE > foo (__Uint64x1_t x) > { > __Uint64x1_t z; > > z = .POPCOUNT (x); > return z; > } Good catch! Thanks, Richard. I've fixed this issue and added the example as a test case in v2. v2: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663916.html Thanks, Pengxuan > > Thanks, > Richard > > > > ;; ...and again in lower case. > > (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi") @@ > > -2318,6 +2320,14 @@ (define_mode_attr VDOUBLE [(VNx16QI "VNx32QI") > > (VNx4SI "VNx8SI") (VNx4SF "VNx8SF") > > (VNx2DI "VNx4DI") (VNx2DF "VNx4DF")]) > > > > +;; The Advanced SIMD modes of popcount corresponding to scalar modes. > > +(define_mode_attr VEC_POP_MODE [(QI "V8QI") (HI "V4HI") > > + (SI "V2SI") (DI "V1DI")]) > > + > > +;; ...and again in lower case. > > +(define_mode_attr vec_pop_mode [(QI "v8qi") (HI "v4hi") > > + (SI "v2si") (DI "v1di")]) > > + > > ;; On AArch64 the By element instruction doesn't have a 2S variant. > > ;; However because the instruction always selects a pair of values > > ;; The normal 3SAME instruction can be used here instead. > > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt11.c > > b/gcc/testsuite/gcc.target/aarch64/popcnt11.c > > new file mode 100644 > > index 00000000000..595b2f9eb93 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt11.c > > @@ -0,0 +1,58 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -march=armv8.2-a+sve" } */ > > +/* { dg-final { check-function-bodies "**" "" "" } } */ > > + > > +/* > > +** f_qi: > > +** ldr b([0-9]+), \[x0\] > > +** cnt v\1.8b, v\1.8b > > +** smov w0, v\1.b\[0\] > > +** ret > > +*/ > > +unsigned > > +f_qi (unsigned char *a) > > +{ > > + return __builtin_popcountg (a[0]); > > +} > > + > > +/* > > +** f_hi: > > +** ldr h([0-9]+), \[x0\] > > +** ptrue (p[0-7]).b, all > > +** cnt z\1.h, \2/m, z\1.h > > +** smov w0, v\1.h\[0\] > > +** ret > > +*/ > > +unsigned > > +f_hi (unsigned short *a) > > +{ > > + return __builtin_popcountg (a[0]); > > +} > > + > > +/* > > +** f_si: > > +** ldr s([0-9]+), \[x0\] > > +** ptrue (p[0-7]).b, all > > +** cnt z\1.s, \2/m, z\1.s > > +** umov x0, v\1.d\[0\] > > +** ret > > +*/ > > +unsigned > > +f_si (unsigned int *a) > > +{ > > + return __builtin_popcountg (a[0]); > > +} > > + > > +/* > > +** f_di: > > +** ldr d([0-9]+), \[x0\] > > +** ptrue (p[0-7])\.b, all > > +** cnt z\1\.d, \2/m, z\1\.d > > +** fmov x0, d\1 > > +** ret > > +*/ > > +unsigned > > +f_di (unsigned long *a) > > +{ > > + return __builtin_popcountg (a[0]); > > +}