Hi Richard,

> Could you go into more detail about what the before and after code
> looks like, and what combine is doing?  Like you say, this sounds
> like a target-independent thing on face value.

It is indeed, but it seems specific to instructions where we have range
information which allows it to remove a redundant sign-extend.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565 for the full details.

> Either way, something like this needs a testcase.

Sure I've added the testcase from pr93565, see below.

Cheers,
Wilco


Although GCC should understand the limited range of clz/ctz/cls results,
Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary
sign extension.  Avoid this by adding an explicit AND with 127 in the
patterns. Deepsjeng performance improves by ~0.6%.

Bootstrap OK.

ChangeLog:
2020-02-04  Wilco Dijkstra  <wdijk...@arm.com>

        PR rtl-optimization/93565
        * config/aarch64/aarch64.md (clz<mode>2): Mask the clz result.
        (clrsb<mode>2): Likewise.
        (ctz<mode>2): Likewise.

        * gcc.target/aarch64/pr93565.c: New test.
--
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
5edc76ee14b55b2b4323530e10bd22b3ffca483e..7ff0536aac42957dbb7a15be766d35cc6725ac40
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4794,7 +4794,8 @@ (define_insn 
"*and_one_cmpl_<SHIFT:optab><mode>3_compare0_no_reuse"
 
 (define_insn "clz<mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-       (clz:GPI (match_operand:GPI 1 "register_operand" "r")))]
+       (and:GPI (clz:GPI (match_operand:GPI 1 "register_operand" "r"))
+                (const_int 127)))]
   ""
   "clz\\t%<w>0, %<w>1"
   [(set_attr "type" "clz")]
@@ -4848,7 +4849,8 @@ (define_expand "popcount<mode>2"
 
 (define_insn "clrsb<mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-        (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
+       (and:GPI (clrsb:GPI (match_operand:GPI 1 "register_operand" "r"))
+                (const_int 127)))]
   ""
   "cls\\t%<w>0, %<w>1"
   [(set_attr "type" "clz")]
@@ -4869,7 +4871,8 @@ (define_insn "rbit<mode>2"
 
 (define_insn_and_split "ctz<mode>2"
  [(set (match_operand:GPI           0 "register_operand" "=r")
-       (ctz:GPI (match_operand:GPI  1 "register_operand" "r")))]
+       (and:GPI (ctz:GPI (match_operand:GPI  1 "register_operand" "r"))
+               (const_int 127)))]
   ""
   "#"
   "reload_completed"
diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c 
b/gcc/testsuite/gcc.target/aarch64/pr93565.c
new file mode 100644
index 
0000000000000000000000000000000000000000..7200f80d1bb161f6a058cc6591f61b6b75cf1749
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr93565.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+static const unsigned long long magic = 0x03f08c5392f756cdULL;
+
+static const char table[64] = {
+     0,  1, 12,  2, 13, 22, 17,  3,
+    14, 33, 23, 36, 18, 58, 28,  4,
+    62, 15, 34, 26, 24, 48, 50, 37,
+    19, 55, 59, 52, 29, 44, 39,  5,
+    63, 11, 21, 16, 32, 35, 57, 27,
+    61, 25, 47, 49, 54, 51, 43, 38,
+    10, 20, 31, 56, 60, 46, 53, 42,
+     9, 30, 45, 41,  8, 40,  7,  6,
+};
+
+static inline int ctz1 (unsigned long  b)
+{
+  unsigned long lsb = b & -b;
+  return table[(lsb * magic) >> 58];
+}
+
+void f (unsigned long x, int *p)
+{
+  if (x != 0)
+    {
+      int a = ctz1 (x);
+      *p = a | p[a];
+    }
+}
+
+/* { dg-final { scan-assembler-times "rbit\t" 1 } } */
+/* { dg-final { scan-assembler-times "clz\t" 1 } } */
+

Reply via email to