https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118174
Bug ID: 118174 Summary: AArch64: Miscompilation at -O3 Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The following function is miscompiled for AArch64 when compiled with -O3: int foo (signed char *p1, signed char *p2) { int sum = 0; for (int i = 0; i < 32; i++) sum += __builtin_abs (p1[i] - p2[i]); return sum; } The GIMPLE passes have transformed this to SAD_EXPR of 16-element vectors, followed by a .REDUC_PLUS. This is then generated as the following assembly: foo: ldp q3, q28, [x0] ldp q2, q31, [x1] uabdl2 v29.8h, v3.16b, v2.16b uabdl2 v1.8h, v28.16b, v31.16b uabal v29.8h, v3.8b, v2.8b uabal v1.8h, v28.8b, v31.8b uaddlp v29.4s, v29.8h uadalp v29.4s, v1.8h uadalp v29.4s, v1.8h uadalp v29.4s, v1.8h addv s31, v29.4s fmov w0, s31 ret The reduction has been generated incorrectly as uaddlp v29.4s, v29.8h uadalp v29.4s, v1.8h uadalp v29.4s, v1.8h uadalp v29.4s, v1.8h addv s31, v29.4s which accumulates the register v1 multiple times.