Hi Kyrill,

On 17/12/2024 15:15, Kyrylo Tkachov wrote:
We avoid using the __builtin_aarch64_* builtins in test cases as they are 
undocumented and we don’t make any guarantees about their stability to users.
I’d prefer if the saturating operation was open-coded in C. I expect the midend 
machinery is smart enough to recognize the saturating logic for scalars by now?

Thanks for the detailed feedback. It's been really helpful, and I've gone ahead and implemented almost all of it. I'm struggling to find a pattern that's recognised for signed arithmetic though- the following emits branching code:

int64_t  __attribute__((noipa))
sadd64 (int64_t __a, int64_t __b)
{
  if (__a > 0) {
    if (__b > INT64_MAX - __a)
      return INT64_MAX;
  } else if (__b < INT64_MIN - __a) {
    return INT64_MIN;
  }
  return __a + __b;
}

Resulting assembly:

|sadd64: .LFB6: .cfi_startproc mov x3, x0 cmp x0, 0 ble .L9 mov x2, 9223372036854775807 sub x4, x2, x0 mov x0, x2 cmp x4, x1 blt .L8 .L11: add x0, x3, x1 .L8: ret .p2align 2,,3 .L9: mov x2, -9223372036854775808 sub x0, x2, x0 cmp x0, x1 ble .L11 mov x0, x2 ret Is there a way to force this not to use branches by any chance? I'll keep looking and see if there are some patterns recently added to match that will work here. If I don't find something, would it be sufficient to use the scalar NEON intrinsics for this? And if so, would that mean the test should move to the Adv. SIMD directory? Many thanks once again, Akram |

Reply via email to