https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128
--- Comment #6 from anlauf at gcc dot gnu.org --- Guided by the dump-tree for the inlining code, I played a little to see what kind of code the middle-end expects. To this end I used C code. The reference for the sum over a rank-1 array (given stride and count): double __attribute__ ((noipa,noinline)) sum_v1 (double x[], int stride, int n) { double sum = 0.; for (int i = 0; i < n; i++) sum += x[i*stride]; return sum; } An optimizer might see a pattern in the array access. I guess that for stride=1 it generates a separate path like: double __attribute__ ((noipa,noinline)) sum_v2 (double x[], int stride, int n) { double sum = 0.; if (stride == 1) { for (int i = 0; i < n; i++) sum += x[i]; } else { for (int i = 0; i < n; i++) sum += x[i*stride]; } return sum; } Unit stride should give faster loads (it does on vector machines). I wonder if gcc does that rather obvious transformation if stride can be 1 and the loop is "hot". Another optimization could be partial sums with reordering - which is what you mention - but only if associative math is enabled. Code might look like: double __attribute__ ((noipa,noinline)) sum_v4 (double x[], int stride, int n) { double sum; double tmp[4]; int i, off, n4; if (n <= 0) return 0.; for (int i = 0; i < 4; i++) tmp[i] = 0.; n4 = n & ~0x3; off = 0; for (i = 0; i < n4; i += 4) { tmp[0] += x[off]; tmp[1] += x[off+1*stride]; tmp[2] += x[off+2*stride]; tmp[3] += x[off+3*stride]; off += 4*stride; } sum = (tmp[0] + tmp[1]) + (tmp[2] + tmp[3]); for (; i < n; i++) sum += x[off + (i-n4)*stride]; return sum; } While this may look natural when memory bandwidth is "infinite" and SIMD performance is limited, I do not know what the situation is for common scalar architectures. Ideally, if annotation of a reduction loop (e.g. "associative math") could tell the middle-end that any of the above transformations is allowed, the generation of inline code could be rather target-agnostic. Doing code transforms as above when generating inline code in the frontend could end up in a nightmare.