We are working on improving codegen for the following test cases (for all integer types T):
T foo (T x, T y) { T diff = x - y; return x > y ? diff : -diff; } T bar (T x, T y) { T diff1 = x - y; T diff2 = y - x; return x > y ? diff1 : diff2; } For signed integers, we already proposed a patch (attached to [1]) that amends existing match.pd patterns in order to produce an ABS_EXPR (x - y). Now, we want to implement the optimization for unsigned integers for AArch64. For example, GCC compiles the function bar for uint8_t to (-O3 -fwrapv) bar_u8: and w3, w0, 255 and w1, w1, 255 sub w2, w3, w1 sub w0, w1, w3 and w2, w2, 255 cmp w3, w1 and w0, w0, 255 csel w0, w0, w2, ls ret whereas clang produces the desired sequence bar_u8: and w8, w0, #0xff sub w8, w8, w1, uxtb cmp w8, #0 cneg w0, w8, mi ret We would like to ask for guidance on where to best implement this optimization for unsigned integers. We have considered the following approaches: - also in match.pd as for the signed integers. However, the existing rule ABS (A) -> A for unsigned integers would fold ABS_EXPR (x - y) to (x - y), which is incorrect if x < y. Are there other ways to express absolute differences for unsigned types on gimple level? - in the aarch64 backend on RTL level: create an if_then_else RTX with (zero-extended) minus expressions in both arms. However, the combine-pass dump shows that we are also lacking an instruction pattern matching each arm: Failed to match this instruction: (set (reg:SI 108) (zero_extend:SI (minus:QI (subreg:QI (reg/v:SI 103 [ x ]) 0) (subreg:QI (reg/v:SI 104 [ y ]) 0)))) which we would want to map to the following split of 3 instructions: and r103, r103, 255 sub r108, r103, r104, uxtb and r108, r108, 255 Do we want to add both those patterns? Any advice would be appreciated. Thanks, Jennifer [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114999
smime.p7s
Description: S/MIME cryptographic signature