https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118174

            Bug ID: 118174
           Summary: AArch64: Miscompilation at -O3
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The following function is miscompiled for AArch64 when compiled with -O3:

int
foo (signed char *p1, signed char *p2)
{
  int sum = 0;
  for (int i = 0; i < 32; i++)
    sum += __builtin_abs (p1[i] - p2[i]);
  return sum;
}

The GIMPLE passes have transformed this to SAD_EXPR of 16-element vectors,
followed by a .REDUC_PLUS. This is then generated as the following assembly:

foo:
        ldp     q3, q28, [x0]
        ldp     q2, q31, [x1]
        uabdl2  v29.8h, v3.16b, v2.16b
        uabdl2  v1.8h, v28.16b, v31.16b
        uabal   v29.8h, v3.8b, v2.8b
        uabal   v1.8h, v28.8b, v31.8b
        uaddlp  v29.4s, v29.8h
        uadalp  v29.4s, v1.8h
        uadalp  v29.4s, v1.8h
        uadalp  v29.4s, v1.8h
        addv    s31, v29.4s
        fmov    w0, s31
        ret

The reduction has been generated incorrectly as

        uaddlp  v29.4s, v29.8h
        uadalp  v29.4s, v1.8h
        uadalp  v29.4s, v1.8h
        uadalp  v29.4s, v1.8h
        addv    s31, v29.4s

which accumulates the register v1 multiple times.

Reply via email to