https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115975
Bug ID: 115975 Summary: sat_add, etc. vector patterns not done for aarch64 (sve) Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64-linux-gnu +++ This bug was initially created as a clone of Bug #115974 +++ Take: ``` void f0(unsigned *__restrict__ a, unsigned * __restrict__ b) { for(int i = 0;i < 1024;i ++) { unsigned tt; if (__builtin_add_overflow (a[i], b[i], &tt)) tt = -1u; a[i] = tt; } } ``` This should be vectorizable. Like it is on riscv or with clang. LLVM's output: ``` .LBB1_1: // =>This Inner Loop Header: Depth=1 ld1b { z0.b }, p0/z, [x0, x8] ld1b { z1.b }, p0/z, [x1, x8] add x9, x0, x8 add x10, x1, x8 ld1w { z2.s }, p1/z, [x9, #1, mul vl] ld1w { z3.s }, p1/z, [x10, #1, mul vl] uqadd z0.s, z0.s, z1.s uqadd z1.s, z2.s, z3.s st1b { z0.b }, p0, [x0, x8] addvl x8, x8, #2 cmp x8, #1, lsl #12 // =4096 st1w { z1.s }, p1, [x9, #1, mul vl] b.ne .LBB1_1 ```