On Tue, Jan 14, 2025 at 4:35 PM Jennifer Schmitz via Gcc <gcc@gcc.gnu.org> wrote: > > We are working on improving codegen for the following test cases (for all > integer types T): > > T foo (T x, T y) > { > T diff = x - y; > return x > y ? diff : -diff; > } > > T bar (T x, T y) > { > T diff1 = x - y; > T diff2 = y - x; > return x > y ? diff1 : diff2; > } > > For signed integers, we already proposed a patch (attached to [1]) that > amends existing match.pd > patterns in order to produce an ABS_EXPR (x - y). > > Now, we want to implement the optimization for unsigned integers for AArch64. > For example, > GCC compiles the function bar for uint8_t to (-O3 -fwrapv) > bar_u8: > and w3, w0, 255 > and w1, w1, 255 > sub w2, w3, w1 > sub w0, w1, w3 > and w2, w2, 255 > cmp w3, w1 > and w0, w0, 255 > csel w0, w0, w2, ls > ret > > whereas clang produces the desired sequence > bar_u8: > and w8, w0, #0xff > sub w8, w8, w1, uxtb > cmp w8, #0 > cneg w0, w8, mi > ret > > We would like to ask for guidance on where to best implement this > optimization for unsigned > integers. We have considered the following approaches: > - also in match.pd as for the signed integers. However, the existing rule ABS > (A) -> A for > unsigned integers would fold ABS_EXPR (x - y) to (x - y), which is > incorrect if x < y. > Are there other ways to express absolute differences for unsigned types on > gimple level?
Well, iff x > y ? x - y : y - x -> abs ((signed)(x - y)) == abs ((signed)(y - x)) then I'd suggest to instead of ABS_EXPR (x - y) use ABSU_EXPR ((signed)(x - y)). But I'm not sure that's OK when the difference cannot be represented in a signed integer, like for x == UINT_MAX and y == 0. The desired assembly uses conditional negation (foo), so why not use that form as target? IIRC value-numbering will already try to CSE y - x as -T when there's a T == x - y. Richard. > - in the aarch64 backend on RTL level: create an if_then_else RTX with > (zero-extended) minus expressions > in both arms. However, the combine-pass dump shows that we are also lacking > an instruction pattern > matching each arm: > Failed to match this instruction: > (set (reg:SI 108) > (zero_extend:SI (minus:QI (subreg:QI (reg/v:SI 103 [ x ]) 0) > (subreg:QI (reg/v:SI 104 [ y ]) 0)))) > which we would want to map to the following split of 3 instructions: > and r103, r103, 255 > sub r108, r103, r104, uxtb > and r108, r108, 255 > Do we want to add both those patterns? > > Any advice would be appreciated. > Thanks, > Jennifer > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114999