https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112612
Bug ID: 112612 Summary: [Missed Optimization] Holding on the loop variable rather than a derived value which can replace it Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz1 at gmx dot com Target Milestone: --- Consider the following function: void foo(int* __restrict__ a) { int i, val; for (i = 0; i < 100; i++) { val = 2 * i; a[i] = val; } } When compiling it for x86_64 with -O3 -fno-unroll-loops -fno-tree-vectorize, GCC 7.2 used to give: foo: xor eax, eax .L2: mov DWORD PTR [rdi], eax add eax, 2 add rdi, 4 cmp eax, 200 jne .L2 rep ret which was rather wasteful, as eax and rdi - eax are linearly related. With GCC 13.2 or trunk on GodBolt as of today, this improves, but not really: foo: xor eax, eax .L2: lea edx, [rax+rax] mov DWORD PTR [rdi+rax*4], edx add rax, 1 cmp rax, 100 jne .L2 ret So, we don't increment two things; but - we do have an addition-via-lea in each iteration. Is that really necessary? I mean, instead of keeping the i variable (in rax), we could keep v = 2 * i, and that's good enough for both addressing and condition checking. Indeed, clang 17 emits: foo: # @foo xor eax, eax .LBB0_1: # =>This Inner Loop Header: Depth=1 mov dword ptr [rdi + 2*rax], eax add rax, 2 cmp rax, 200 jne .LBB0_1 ret which is almost the same, except that it holds v = 2 * i rather than i. (clang has produced this code since v3.0.0 at least.) GodBolt link: https://gcc.godbolt.org/z/MjzTbr831 Originally discussed in this SO question: https://stackoverflow.com/q/48354636/1593077