https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66036
Bug ID: 66036 Summary: strided group loads are not vectorized Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- For example struct Xd { double x; double y; }; double testd (struct Xd *x, int stride, int n) { int i; double sum = 0.; for (i = 0; i < n; ++i) { sum += x[i*stride].x; sum += x[i*stride].y; } return sum; } or similar cases without reduction (simple case) int testi (int *p, short *q, int stride, int n) { int i; for (i = 0; i < n; ++i) { q[i*4+0] = p[i*stride+0]; q[i*4+1] = p[i*stride+1]; q[i*4+2] = p[i*stride+2]; q[i*4+3] = p[i*stride+3]; } } or the more complex case int testi2 (int *q, short *p, int stride, int n) { int i; for (i = 0; i < n; ++i) { q[i*4+0] = p[i*stride+0]; q[i*4+1] = p[i*stride+1]; q[i*4+2] = p[i*stride+2]; q[i*4+3] = p[i*stride+3]; } } because here the SLP group has smaller-than-vector size and thus requires two "scalar" loads and a vector build from them (x86_64 movhlps/movulps). The more complex form happens in SPEC CPUv6.