https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116493
Bug ID: 116493 Summary: widening reduction add could be better Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64 Take: ``` unsigned int f(unsigned short *a) { unsigned int t1 = 0; for(int i = 0; i < 16; i++) t1 += a[i]; return t1; } ``` GCC generates: ``` ldp q0, q31, [x0] uxtl v30.4s, v0.4h uaddw2 v30.4s, v30.4s, v0.8h uaddw v30.4s, v30.4s, v31.4h uaddw2 v30.4s, v30.4s, v31.8h addv s31, v30.4s fmov w0, s31 ``` This could be improved a few things, first the first two `uxtl/uaddw2` pair could be changed to: ``` uxtl v30.4s, v0.4h uxtl2 v30.4s, v0.8h ``` That is simplify: vect_patt_20.8_2 = vect__4.6_1 w+ { 0, 0, 0, 0 }; into just: vect_patt_20.8_2 = (vector(8) unsigned int)vect__4.6_1; And then I think we could handle the widending add better for the second case.