On Tue, Jan 14, 2025 at 4:35 PM Jennifer Schmitz via Gcc
<gcc@gcc.gnu.org> wrote:
>
> We are working on improving codegen for the following test cases (for all 
> integer types T):
>
> T foo (T x, T y)
> {
>   T diff = x - y;
>   return x > y ? diff : -diff;
> }
>
> T bar (T x, T y)
> {
>   T diff1 = x - y;
>   T diff2 = y - x;
>   return x > y ? diff1 : diff2;
> }
>
> For signed integers, we already proposed a patch (attached to [1]) that 
> amends existing match.pd
> patterns in order to produce an ABS_EXPR (x - y).
>
> Now, we want to implement the optimization for unsigned integers for AArch64. 
> For example,
> GCC compiles the function bar for uint8_t to (-O3 -fwrapv)
> bar_u8:
>         and     w3, w0, 255
>         and     w1, w1, 255
>         sub     w2, w3, w1
>         sub     w0, w1, w3
>         and     w2, w2, 255
>         cmp     w3, w1
>         and     w0, w0, 255
>         csel    w0, w0, w2, ls
>         ret
>
> whereas clang produces the desired sequence
> bar_u8:
>         and     w8, w0, #0xff
>         sub     w8, w8, w1, uxtb
>         cmp     w8, #0
>         cneg    w0, w8, mi
>         ret
>
> We would like to ask for guidance on where to best implement this 
> optimization for unsigned
> integers. We have considered the following approaches:
> - also in match.pd as for the signed integers. However, the existing rule ABS 
> (A) -> A for
>   unsigned integers would fold ABS_EXPR (x - y) to (x - y), which is 
> incorrect if x < y.
>   Are there other ways to express absolute differences for unsigned types on 
> gimple level?

Well, iff x > y ? x - y : y - x -> abs ((signed)(x - y)) == abs
((signed)(y - x)) then I'd suggest
to instead of ABS_EXPR (x - y) use ABSU_EXPR ((signed)(x - y)).

But I'm not sure that's OK when the difference cannot be represented
in a signed integer, like
for x == UINT_MAX and y == 0.  The desired assembly uses conditional
negation (foo), so
why not use that form as target?  IIRC value-numbering will already
try to CSE y - x as -T
when there's a T == x - y.

Richard.

> - in the aarch64 backend on RTL level: create an if_then_else RTX with 
> (zero-extended) minus expressions
>   in both arms. However, the combine-pass dump shows that we are also lacking 
> an instruction pattern
>   matching each arm:
>     Failed to match this instruction:
>     (set (reg:SI 108)
>         (zero_extend:SI (minus:QI (subreg:QI (reg/v:SI 103 [ x ]) 0)
>                 (subreg:QI (reg/v:SI 104 [ y ]) 0))))
>   which we would want to map to the following split of 3 instructions:
>     and r103, r103, 255
>     sub r108, r103, r104, uxtb
>     and r108, r108, 255
>   Do we want to add both those patterns?
>
> Any advice would be appreciated.
> Thanks,
> Jennifer
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114999

Reply via email to