https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116265

            Bug ID: 116265
           Summary: Missing optimization: Vectorization of modulo operator
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: jschmitz at gcc dot gnu.org
          Reporter: jschmitz at gcc dot gnu.org
  Target Milestone: ---

On aarch64 Neoverse-v2, GCC does not vectorize the modulo operator in loops if
the second operand is a memory reference, as in the test case below, even with
-Ofast.

I am planning to fix this and would like advice on where best to implement it.

void foo (unsigned int *x, unsigned int *y, int n)
{
  for (int i = 0; i < n; ++i)
    x[i] = x[i] % y[i];
}

compiles to

ldr     w5, [x0, x2]
        ldr     w4, [x1, x2]
        udiv    w3, w5, w4
        msub    w3, w3, w4, w5
        str     w3, [x0, x2]
        add     x2, x2, 4
        cmp     x6, x2
        bne     .L3

Reply via email to