https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
Bug ID: 67323 Summary: Use non-unit stride loads by preference when applicable Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: michael.collison at linaro dot org Target Milestone: --- On arm targets the following code fails to generate a vld3: struct pixel { char r,g,b; }; void t2(int len, struct pixel * __restrict p, struct pixel * __restrict x) { len = len & ~31; for (int i = 0; i < len; i++){ p[i].r = x[i].r * 2; p[i].g = x[i].g * 3; p[i].b = x[i].b * 4; } } Yes the same code with line 11 changed to: p[i].g = x[i].g; does generate a vld3.