https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116493

            Bug ID: 116493
           Summary: widening reduction add could be better
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take:
```
unsigned int f(unsigned short *a)
{
  unsigned int t1 = 0;
  for(int i = 0; i < 16; i++)
    t1 += a[i];
  return t1;
}
```

GCC generates:
```
        ldp     q0, q31, [x0]
        uxtl    v30.4s, v0.4h
        uaddw2  v30.4s, v30.4s, v0.8h
        uaddw   v30.4s, v30.4s, v31.4h
        uaddw2  v30.4s, v30.4s, v31.8h
        addv    s31, v30.4s
        fmov    w0, s31
```

This could be improved a few things, first the first two `uxtl/uaddw2` pair
could be changed to:
```
        uxtl    v30.4s, v0.4h
        uxtl2   v30.4s, v0.8h
```

That is simplify:
  vect_patt_20.8_2 = vect__4.6_1 w+ { 0, 0, 0, 0 };

into just:
  vect_patt_20.8_2 = (vector(8) unsigned int)vect__4.6_1;

And then I think we could handle the widending add better for the second case.

Reply via email to