Pengxuan Zheng <quic_pzh...@quicinc.com> writes: > This is similar to the recent improvements to the Advanced SIMD popcount > expansion by using SVE. We can utilize SVE to generate more efficient code for > scalar mode popcount too. > > Changes since v1: > * v2: Add a new VNx1BI mode and a new test case for V1DI. > * v3: Abandon VNx1BI changes and add a new variant of aarch64_ptrue_reg.
Sorry for the slow review. The patch looks good though. OK with the changes below: > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt12.c > b/gcc/testsuite/gcc.target/aarch64/popcnt12.c > new file mode 100644 > index 00000000000..f086cae55a2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt12.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fgimple" } */ > +/* { dg-final { check-function-bodies "**" "" "" } } */ > + It's probably safer to add: #pragma GCC target "+nosve" here, so that we don't try to use the SVE instructions. > +/* > +** foo: > +** cnt v0.8b, v0.8b > +** addv b0, v0.8b Nothing requires the temporary register to be v0, so this should be something like: cnt (v[0-9]+\.8b), v0\.8b addv b0, \1 Thanks, Richard > +** ret > +*/ > +__Uint64x1_t __GIMPLE > +foo (__Uint64x1_t x) > +{ > + __Uint64x1_t z; > + > + z = .POPCOUNT (x); > + return z; > +}