Wilco Dijkstra <wilco.dijks...@arm.com> writes:
> Although GCC should understand the limited range of clz/ctz/cls results,
> Combine sometimes behaves oddly and duplicates ctz to remove a
> sign extension.  Avoid this by adding an explicit AND with 127 in the
> patterns. Deepsjeng performance improves by ~0.6%.

Could you go into more detail about what the before and after code
looks like, and what combine is doing?  Like you say, this sounds
like a target-independent thing on face value.

Either way, something like this needs a testcase.

Thanks,
Richard

>
> Bootstrap OK.
>
> ChangeLog:
> 2020-02-03  Wilco Dijkstra  <wdijk...@arm.com>
>
> * config/aarch64/aarch64.md (clz<mode>2): Mask the clz result.
> (clrsb<mode>2): Likewise.
> (ctz<mode>2): Likewise.
> --
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 5edc76ee14b55b2b4323530e10bd22b3ffca483e..7ff0536aac42957dbb7a15be766d35cc6725ac40
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4794,7 +4794,8 @@ (define_insn 
> "*and_one_cmpl_<SHIFT:optab><mode>3_compare0_no_reuse"
>
>  (define_insn "clz<mode>2"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
> -(clz:GPI (match_operand:GPI 1 "register_operand" "r")))]
> +(and:GPI (clz:GPI (match_operand:GPI 1 "register_operand" "r"))
> + (const_int 127)))]
>    ""
>    "clz\\t%<w>0, %<w>1"
>    [(set_attr "type" "clz")]
> @@ -4848,7 +4849,8 @@ (define_expand "popcount<mode>2"
>
>  (define_insn "clrsb<mode>2"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
> -        (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
> +(and:GPI (clrsb:GPI (match_operand:GPI 1 "register_operand" "r"))
> + (const_int 127)))]
>    ""
>    "cls\\t%<w>0, %<w>1"
>    [(set_attr "type" "clz")]
> @@ -4869,7 +4871,8 @@ (define_insn "rbit<mode>2"
>
>  (define_insn_and_split "ctz<mode>2"
>   [(set (match_operand:GPI           0 "register_operand" "=r")
> -       (ctz:GPI (match_operand:GPI  1 "register_operand" "r")))]
> +       (and:GPI (ctz:GPI (match_operand:GPI  1 "register_operand" "r"))
> +(const_int 127)))]
>    ""
>    "#"
>    "reload_completed"

Reply via email to