> -----Original Message-----
> From: Kyrylo Tkachov <ktkac...@nvidia.com>
> Sent: Monday, June 10, 2024 12:26 AM
> To: Andrew Pinski (QUIC) <quic_apin...@quicinc.com>; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] aarch64: Improve popcount for bytes
> [PR113042]
> 
> Hi Andrew
> 
> -----Original Message-----
> From: Andrew Pinski <quic_apin...@quicinc.com
> <mailto:quic_apin...@quicinc.com>>
> Date: Monday, 10 June 2024 at 06:05
> To: "gcc-patches@gcc.gnu.org <mailto:gcc-
> patc...@gcc.gnu.org>" <gcc-patches@gcc.gnu.org
> <mailto:gcc-patches@gcc.gnu.org>>
> Cc: Andrew Pinski <quic_apin...@quicinc.com
> <mailto:quic_apin...@quicinc.com>>
> Subject: [PATCH] aarch64: Improve popcount for bytes
> [PR113042]
> 
> 
> For popcount for bytes, we don't need the reduction addition
> after the vector cnt instruction as we are only counting one
> byte's popcount.
> This implements a new define_expand to handle that.
> 
> 
> Bootstrapped and tested on aarch64-linux-gnu with no
> regressions.
> 
> 
> PR target/113042
> 
> 
> gcc/ChangeLog:
> 
> 
> * config/aarch64/aarch64.md (popcountqi2): New pattern.
> 
> 
> gcc/testsuite/ChangeLog:
> 
> 
> * gcc.target/aarch64/popcnt5.c: New test.
> 
> 
> Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com
> <mailto:quic_apin...@quicinc.com>>
> ---
> gcc/config/aarch64/aarch64.md | 26
> ++++++++++++++++++++++
> gcc/testsuite/gcc.target/aarch64/popcnt5.c | 19
> ++++++++++++++++
> 2 files changed, 45 insertions(+)
> create mode 100644
> gcc/testsuite/gcc.target/aarch64/popcnt5.c
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md index
> 389a1906e23..ebaf7ec9970 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5358,6 +5358,32 @@ (define_expand
> "popcount<mode>2"
> }
> })
> 
> 
> +/* Popcount for byte can remove the reduction part after the
> popcount.
> + For optimization reasons, enabling this for CSSC. */
> (define_expand
> +"popcountqi2"
> + [(set (match_operand:QI 0 "register_operand" "=w")
> (popcount:QI
> +(match_operand:QI 1 "register_operand" "w")))]
> "TARGET_CSSC ||
> +TARGET_SIMD"
> +{
> + rtx in = operands[1];
> + rtx out = operands[0];
> + if (TARGET_CSSC)
> + {
> + rtx tmp = gen_reg_rtx (SImode);
> + rtx out1 = gen_reg_rtx (SImode);
> + emit_insn (gen_zero_extendqisi2 (tmp, in));  emit_insn
> +(gen_popcountsi2 (out1, tmp));  emit_move_insn (out,
> gen_lowpart
> +(QImode, out1));  DONE;  }  rtx v = gen_reg_rtx (V8QImode);
> rtx v1 =
> +gen_reg_rtx (V8QImode);  emit_move_insn (v, gen_lowpart
> (V8QImode,
> +in));  emit_insn (gen_popcountv8qi2 (v1, v));
> emit_move_insn (out,
> +gen_lowpart (QImode, v1));  DONE;
> +})
> 
> TBH I'd rather merge it with the GPI popcount pattern that
> looks almost identical. You could extend it with the ALLI
> iterator and handle HImode as well quite easily.

I was thinking about that beforehand, but I was trying for the simplified patch 
at the time.
Anyways I posted the updated version: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654115.html

And it includes the CSSC testcases too to make sure the generated code is 
correct.

Thanks,
Andrew Pinski



> Thanks,
> Kyrill
> 
> 
> +
> (define_insn "clrsb<mode>2"
> [(set (match_operand:GPI 0 "register_operand" "=r")
> (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt5.c
> b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
> new file mode 100644
> index 00000000000..406369d9b29
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +/* PR target/113042 */
> +
> +#pragma GCC target "+nocssc"
> +
> +/*
> +** h8:
> +** ldr b[0-9]+, \[x0\]
> +** cnt v[0-9]+.8b, v[0-9]+.8b
> +** smov w0, v[0-9]+.b\[0\]
> +** ret
> +*/
> +/* We should not need the addv here since we only need a
> byte popcount.
> +*/
> +
> +unsigned h8 (const unsigned char *a) {
> + return __builtin_popcountg (a[0]);
> +}
> --
> 2.42.0
> 
> 
> 
> 

Reply via email to