Hi Richard, > Could you go into more detail about what the before and after code > looks like, and what combine is doing? Like you say, this sounds > like a target-independent thing on face value.
It is indeed, but it seems specific to instructions where we have range information which allows it to remove a redundant sign-extend. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565 for the full details. > Either way, something like this needs a testcase. Sure I've added the testcase from pr93565, see below. Cheers, Wilco Although GCC should understand the limited range of clz/ctz/cls results, Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary sign extension. Avoid this by adding an explicit AND with 127 in the patterns. Deepsjeng performance improves by ~0.6%. Bootstrap OK. ChangeLog: 2020-02-04 Wilco Dijkstra <wdijk...@arm.com> PR rtl-optimization/93565 * config/aarch64/aarch64.md (clz<mode>2): Mask the clz result. (clrsb<mode>2): Likewise. (ctz<mode>2): Likewise. * gcc.target/aarch64/pr93565.c: New test. -- diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5edc76ee14b55b2b4323530e10bd22b3ffca483e..7ff0536aac42957dbb7a15be766d35cc6725ac40 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4794,7 +4794,8 @@ (define_insn "*and_one_cmpl_<SHIFT:optab><mode>3_compare0_no_reuse" (define_insn "clz<mode>2" [(set (match_operand:GPI 0 "register_operand" "=r") - (clz:GPI (match_operand:GPI 1 "register_operand" "r")))] + (and:GPI (clz:GPI (match_operand:GPI 1 "register_operand" "r")) + (const_int 127)))] "" "clz\\t%<w>0, %<w>1" [(set_attr "type" "clz")] @@ -4848,7 +4849,8 @@ (define_expand "popcount<mode>2" (define_insn "clrsb<mode>2" [(set (match_operand:GPI 0 "register_operand" "=r") - (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))] + (and:GPI (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")) + (const_int 127)))] "" "cls\\t%<w>0, %<w>1" [(set_attr "type" "clz")] @@ -4869,7 +4871,8 @@ (define_insn "rbit<mode>2" (define_insn_and_split "ctz<mode>2" [(set (match_operand:GPI 0 "register_operand" "=r") - (ctz:GPI (match_operand:GPI 1 "register_operand" "r")))] + (and:GPI (ctz:GPI (match_operand:GPI 1 "register_operand" "r")) + (const_int 127)))] "" "#" "reload_completed" diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c b/gcc/testsuite/gcc.target/aarch64/pr93565.c new file mode 100644 index 0000000000000000000000000000000000000000..7200f80d1bb161f6a058cc6591f61b6b75cf1749 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr93565.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +static const unsigned long long magic = 0x03f08c5392f756cdULL; + +static const char table[64] = { + 0, 1, 12, 2, 13, 22, 17, 3, + 14, 33, 23, 36, 18, 58, 28, 4, + 62, 15, 34, 26, 24, 48, 50, 37, + 19, 55, 59, 52, 29, 44, 39, 5, + 63, 11, 21, 16, 32, 35, 57, 27, + 61, 25, 47, 49, 54, 51, 43, 38, + 10, 20, 31, 56, 60, 46, 53, 42, + 9, 30, 45, 41, 8, 40, 7, 6, +}; + +static inline int ctz1 (unsigned long b) +{ + unsigned long lsb = b & -b; + return table[(lsb * magic) >> 58]; +} + +void f (unsigned long x, int *p) +{ + if (x != 0) + { + int a = ctz1 (x); + *p = a | p[a]; + } +} + +/* { dg-final { scan-assembler-times "rbit\t" 1 } } */ +/* { dg-final { scan-assembler-times "clz\t" 1 } } */ +