https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116265
Bug ID: 116265 Summary: Missing optimization: Vectorization of modulo operator Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: jschmitz at gcc dot gnu.org Reporter: jschmitz at gcc dot gnu.org Target Milestone: --- On aarch64 Neoverse-v2, GCC does not vectorize the modulo operator in loops if the second operand is a memory reference, as in the test case below, even with -Ofast. I am planning to fix this and would like advice on where best to implement it. void foo (unsigned int *x, unsigned int *y, int n) { for (int i = 0; i < n; ++i) x[i] = x[i] % y[i]; } compiles to ldr w5, [x0, x2] ldr w4, [x1, x2] udiv w3, w5, w4 msub w3, w3, w4, w5 str w3, [x0, x2] add x2, x2, 4 cmp x6, x2 bne .L3