Hi Naveen,
On 12/12/16 03:16, Hurugalawadi, Naveen wrote:
Hi,
Please find attached the patch that implements the support for popcount
patterns in AArch64.
The implementation improves OVS-DPDK on ThunderX by 3%. It would have a
similar effect on other AArch64 targets.
Please review the patch and let us know if its okay?
Besides a few comments below this looks good to me if it has gone through
the normal required bootstrap and testing, but I can't approve.
It's at the discretion of the target maintainers/reviewers if they want to
accept it at this stage.
2016-12-12 Andrew Pinski <apin...@cavium.com>
gcc
* config/aarch64/aarch64.md (popcount<mode>2): New pattern.
gcc/testsuite
* gcc.target/aarch64/popcount.c : New Testcase.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 65eb326..c688ddc 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3785,6 +3785,39 @@
}
)
+/* Pop count be done via the pop count instruction in NEON. */
The rest of the MD file uses the term AdvSIMD. Also, the instrurction
is CNT rather than "pop count".
+/*
+ mov v.1d, x0
+ Cnt v1.8b, v.8b
+ Addv b2, v1.8b
+ Mov w0, v2.b[0]
+*/
Minor nit, but please use semicolons for MD file comments. The convention (at
least in the aarch64 md files)
is to use ";;" at the beginning of the line. Also, please use uppercase letters
for the assembly instructions
in the comment.
+(define_expand "popcount<mode>2"
+ [(match_operand:GPI 0 "register_operand")
+ (match_operand:GPI 1 "register_operand")]
+ "TARGET_SIMD"
+{
+ rtx v = gen_reg_rtx (V8QImode);
+ rtx v1 = gen_reg_rtx (V8QImode);
+ rtx r = gen_reg_rtx (QImode);
+ rtx in = operands[1];
+ rtx out = operands[0];
+ if(<MODE>mode == SImode)
+ {
+ rtx tmp;
+ tmp = gen_reg_rtx (DImode);
+ /* If we have SImode, zero extend to DImode, pop count does
+ not change if we have extra zeros. */
+ emit_insn (gen_zero_extendsidi2 (tmp, in));
+ in = tmp;
+ }
+ emit_move_insn (v, gen_lowpart (V8QImode, in));
+ emit_insn (gen_popcountv8qi2 (v1, v));
+ emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
+ emit_insn (gen_zero_extendqi<mode>2 (out, r));
+ DONE;
+})
+
(define_insn "clrsb<mode>2"
[(set (match_operand:GPI 0 "register_operand" "=r")
(clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcount.c
b/gcc/testsuite/gcc.target/aarch64/popcount.c
new file mode 100644
index 0000000..2d71168
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcount.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int foo(int x)
+{
+ return __builtin_popcount(x);
+}
+
+/* { dg-final { scan-assembler-not "popcountdi2" } } */
__builtin_popcount takes an unsigned int, so this should be scanning for
absence of popcountsi2 instead?
I suggest you also add a test for popcountdi2 using__builtin_popcountll.
Thanks,
Kyrill
+/* { dg-final { scan-assembler "cnt\t" } } */