> -----Original Message----- > From: Kyrylo Tkachov <ktkac...@nvidia.com> > Sent: Monday, June 10, 2024 12:26 AM > To: Andrew Pinski (QUIC) <quic_apin...@quicinc.com>; gcc- > patc...@gcc.gnu.org > Subject: Re: [PATCH] aarch64: Improve popcount for bytes > [PR113042] > > Hi Andrew > > -----Original Message----- > From: Andrew Pinski <quic_apin...@quicinc.com > <mailto:quic_apin...@quicinc.com>> > Date: Monday, 10 June 2024 at 06:05 > To: "gcc-patches@gcc.gnu.org <mailto:gcc- > patc...@gcc.gnu.org>" <gcc-patches@gcc.gnu.org > <mailto:gcc-patches@gcc.gnu.org>> > Cc: Andrew Pinski <quic_apin...@quicinc.com > <mailto:quic_apin...@quicinc.com>> > Subject: [PATCH] aarch64: Improve popcount for bytes > [PR113042] > > > For popcount for bytes, we don't need the reduction addition > after the vector cnt instruction as we are only counting one > byte's popcount. > This implements a new define_expand to handle that. > > > Bootstrapped and tested on aarch64-linux-gnu with no > regressions. > > > PR target/113042 > > > gcc/ChangeLog: > > > * config/aarch64/aarch64.md (popcountqi2): New pattern. > > > gcc/testsuite/ChangeLog: > > > * gcc.target/aarch64/popcnt5.c: New test. > > > Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com > <mailto:quic_apin...@quicinc.com>> > --- > gcc/config/aarch64/aarch64.md | 26 > ++++++++++++++++++++++ > gcc/testsuite/gcc.target/aarch64/popcnt5.c | 19 > ++++++++++++++++ > 2 files changed, 45 insertions(+) > create mode 100644 > gcc/testsuite/gcc.target/aarch64/popcnt5.c > > > diff --git a/gcc/config/aarch64/aarch64.md > b/gcc/config/aarch64/aarch64.md index > 389a1906e23..ebaf7ec9970 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -5358,6 +5358,32 @@ (define_expand > "popcount<mode>2" > } > }) > > > +/* Popcount for byte can remove the reduction part after the > popcount. > + For optimization reasons, enabling this for CSSC. */ > (define_expand > +"popcountqi2" > + [(set (match_operand:QI 0 "register_operand" "=w") > (popcount:QI > +(match_operand:QI 1 "register_operand" "w")))] > "TARGET_CSSC || > +TARGET_SIMD" > +{ > + rtx in = operands[1]; > + rtx out = operands[0]; > + if (TARGET_CSSC) > + { > + rtx tmp = gen_reg_rtx (SImode); > + rtx out1 = gen_reg_rtx (SImode); > + emit_insn (gen_zero_extendqisi2 (tmp, in)); emit_insn > +(gen_popcountsi2 (out1, tmp)); emit_move_insn (out, > gen_lowpart > +(QImode, out1)); DONE; } rtx v = gen_reg_rtx (V8QImode); > rtx v1 = > +gen_reg_rtx (V8QImode); emit_move_insn (v, gen_lowpart > (V8QImode, > +in)); emit_insn (gen_popcountv8qi2 (v1, v)); > emit_move_insn (out, > +gen_lowpart (QImode, v1)); DONE; > +}) > > TBH I'd rather merge it with the GPI popcount pattern that > looks almost identical. You could extend it with the ALLI > iterator and handle HImode as well quite easily.
I was thinking about that beforehand, but I was trying for the simplified patch at the time. Anyways I posted the updated version: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654115.html And it includes the CSSC testcases too to make sure the generated code is correct. Thanks, Andrew Pinski > Thanks, > Kyrill > > > + > (define_insn "clrsb<mode>2" > [(set (match_operand:GPI 0 "register_operand" "=r") > (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))] > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt5.c > b/gcc/testsuite/gcc.target/aarch64/popcnt5.c > new file mode 100644 > index 00000000000..406369d9b29 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt5.c > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > +/* PR target/113042 */ > + > +#pragma GCC target "+nocssc" > + > +/* > +** h8: > +** ldr b[0-9]+, \[x0\] > +** cnt v[0-9]+.8b, v[0-9]+.8b > +** smov w0, v[0-9]+.b\[0\] > +** ret > +*/ > +/* We should not need the addv here since we only need a > byte popcount. > +*/ > + > +unsigned h8 (const unsigned char *a) { > + return __builtin_popcountg (a[0]); > +} > -- > 2.42.0 > > > >