https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116265
Bug ID: 116265
Summary: Missing optimization: Vectorization of modulo operator
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P3
Component: tree-optimization
Assignee: jschmitz at gcc dot gnu.org
Reporter: jschmitz at gcc dot gnu.org
Target Milestone: ---
On aarch64 Neoverse-v2, GCC does not vectorize the modulo operator in loops if
the second operand is a memory reference, as in the test case below, even with
-Ofast.
I am planning to fix this and would like advice on where best to implement it.
void foo (unsigned int *x, unsigned int *y, int n)
{
for (int i = 0; i < n; ++i)
x[i] = x[i] % y[i];
}
compiles to
ldr w5, [x0, x2]
ldr w4, [x1, x2]
udiv w3, w5, w4
msub w3, w3, w4, w5
str w3, [x0, x2]
add x2, x2, 4
cmp x6, x2
bne .L3