Re: Optimizing codegen for absolute differences in AArch64

Jennifer Schmitz via Gcc Wed, 15 Jan 2025 06:45:24 -0800

> On 15 Jan 2025, at 08:38, Richard Biener <richard.guent...@gmail.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Jan 14, 2025 at 4:35 PM Jennifer Schmitz via Gcc
> <gcc@gcc.gnu.org> wrote:
>> 
>> We are working on improving codegen for the following test cases (for all 
>> integer types T):
>> 
>> T foo (T x, T y)
>> {
>>  T diff = x - y;
>>  return x > y ? diff : -diff;
>> }
>> 
>> T bar (T x, T y)
>> {
>>  T diff1 = x - y;
>>  T diff2 = y - x;
>>  return x > y ? diff1 : diff2;
>> }
>> 
>> For signed integers, we already proposed a patch (attached to [1]) that 
>> amends existing match.pd
>> patterns in order to produce an ABS_EXPR (x - y).
>> 
>> Now, we want to implement the optimization for unsigned integers for 
>> AArch64. For example,
>> GCC compiles the function bar for uint8_t to (-O3 -fwrapv)
>> bar_u8:
>>        and     w3, w0, 255
>>        and     w1, w1, 255
>>        sub     w2, w3, w1
>>        sub     w0, w1, w3
>>        and     w2, w2, 255
>>        cmp     w3, w1
>>        and     w0, w0, 255
>>        csel    w0, w0, w2, ls
>>        ret
>> 
>> whereas clang produces the desired sequence
>> bar_u8:
>>        and     w8, w0, #0xff
>>        sub     w8, w8, w1, uxtb
>>        cmp     w8, #0
>>        cneg    w0, w8, mi
>>        ret
>> 
>> We would like to ask for guidance on where to best implement this 
>> optimization for unsigned
>> integers. We have considered the following approaches:
>> - also in match.pd as for the signed integers. However, the existing rule 
>> ABS (A) -> A for
>>  unsigned integers would fold ABS_EXPR (x - y) to (x - y), which is 
>> incorrect if x < y.
>>  Are there other ways to express absolute differences for unsigned types on 
>> gimple level?
> 
> Well, iff x > y ? x - y : y - x -> abs ((signed)(x - y)) == abs
> ((signed)(y - x)) then I'd suggest
> to instead of ABS_EXPR (x - y) use ABSU_EXPR ((signed)(x - y)).
> 
> But I'm not sure that's OK when the difference cannot be represented
> in a signed integer, like
> for x == UINT_MAX and y == 0.  The desired assembly uses conditional
> negation (foo), so
> why not use that form as target?  IIRC value-numbering will already
> try to CSE y - x as -T
> when there's a T == x - y.
Hi Richard,
thanks for responding to my email.
I tried using ABSU_EXPR ((signed)(x - y)), but as you expected it yields 
incorrect results
for some corner cases (if the top bit is set in x - y). So, I’m not sure if 
there is a
good way to fold it on gimple level that also gives correct results in all 
cases.
That points to doing it on RTL level...
Thanks,
Jennifer
> 
> Richard.
> 
>> - in the aarch64 backend on RTL level: create an if_then_else RTX with 
>> (zero-extended) minus expressions
>>  in both arms. However, the combine-pass dump shows that we are also lacking 
>> an instruction pattern
>>  matching each arm:
>>    Failed to match this instruction:
>>    (set (reg:SI 108)
>>        (zero_extend:SI (minus:QI (subreg:QI (reg/v:SI 103 [ x ]) 0)
>>                (subreg:QI (reg/v:SI 104 [ y ]) 0))))
>>  which we would want to map to the following split of 3 instructions:
>>    and r103, r103, 255
>>    sub r108, r103, r104, uxtb
>>    and r108, r108, 255
>>  Do we want to add both those patterns?
>> 
>> Any advice would be appreciated.
>> Thanks,
>> Jennifer
>> 
>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114999
smime.p7s
Description: S/MIME cryptographic signature
Re: Optimizing codegen for absolute differences in AArch64

Reply via email to