Wilco Dijkstra <wilco.dijks...@arm.com> writes: > Although GCC should understand the limited range of clz/ctz/cls results, > Combine sometimes behaves oddly and duplicates ctz to remove a > sign extension. Avoid this by adding an explicit AND with 127 in the > patterns. Deepsjeng performance improves by ~0.6%.
Could you go into more detail about what the before and after code looks like, and what combine is doing? Like you say, this sounds like a target-independent thing on face value. Either way, something like this needs a testcase. Thanks, Richard > > Bootstrap OK. > > ChangeLog: > 2020-02-03 Wilco Dijkstra <wdijk...@arm.com> > > * config/aarch64/aarch64.md (clz<mode>2): Mask the clz result. > (clrsb<mode>2): Likewise. > (ctz<mode>2): Likewise. > -- > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index > 5edc76ee14b55b2b4323530e10bd22b3ffca483e..7ff0536aac42957dbb7a15be766d35cc6725ac40 > 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -4794,7 +4794,8 @@ (define_insn > "*and_one_cmpl_<SHIFT:optab><mode>3_compare0_no_reuse" > > (define_insn "clz<mode>2" > [(set (match_operand:GPI 0 "register_operand" "=r") > -(clz:GPI (match_operand:GPI 1 "register_operand" "r")))] > +(and:GPI (clz:GPI (match_operand:GPI 1 "register_operand" "r")) > + (const_int 127)))] > "" > "clz\\t%<w>0, %<w>1" > [(set_attr "type" "clz")] > @@ -4848,7 +4849,8 @@ (define_expand "popcount<mode>2" > > (define_insn "clrsb<mode>2" > [(set (match_operand:GPI 0 "register_operand" "=r") > - (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))] > +(and:GPI (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")) > + (const_int 127)))] > "" > "cls\\t%<w>0, %<w>1" > [(set_attr "type" "clz")] > @@ -4869,7 +4871,8 @@ (define_insn "rbit<mode>2" > > (define_insn_and_split "ctz<mode>2" > [(set (match_operand:GPI 0 "register_operand" "=r") > - (ctz:GPI (match_operand:GPI 1 "register_operand" "r")))] > + (and:GPI (ctz:GPI (match_operand:GPI 1 "register_operand" "r")) > +(const_int 127)))] > "" > "#" > "reload_completed"