https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115975

            Bug ID: 115975
           Summary: sat_add, etc. vector patterns not done for aarch64
                    (sve)
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64-linux-gnu

+++ This bug was initially created as a clone of Bug #115974 +++

Take:
```
void f0(unsigned *__restrict__  a, unsigned * __restrict__ b)
{
        for(int i = 0;i < 1024;i ++)
        {
          unsigned tt;
          if (__builtin_add_overflow (a[i], b[i], &tt))
            tt = -1u;
          a[i] = tt;
        }
}
```

This should be vectorizable. Like it is on riscv or with clang.

LLVM's output:
```
.LBB1_1:                                // =>This Inner Loop Header: Depth=1
        ld1b    { z0.b }, p0/z, [x0, x8]
        ld1b    { z1.b }, p0/z, [x1, x8]
        add     x9, x0, x8
        add     x10, x1, x8
        ld1w    { z2.s }, p1/z, [x9, #1, mul vl]
        ld1w    { z3.s }, p1/z, [x10, #1, mul vl]
        uqadd   z0.s, z0.s, z1.s
        uqadd   z1.s, z2.s, z3.s
        st1b    { z0.b }, p0, [x0, x8]
        addvl   x8, x8, #2
        cmp     x8, #1, lsl #12                 // =4096
        st1w    { z1.s }, p1, [x9, #1, mul vl]
        b.ne    .LBB1_1
```

Reply via email to