http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39838
--- Comment #16 from bin.cheng <amker.cheng at gmail dot com> --- For optimization level O2, the dump before IVOPT is like: <bb 2>: _21 = p_6(D)->count; if (_21 > 0) goto <bb 3>; else goto <bb 11>; <bb 3>: <bb 4>: # i_26 = PHI <i_20(10), 0(3)> if (count_8(D) > 0) goto <bb 5>; else goto <bb 9>; <bb 5>: pretmp_23 = (sizetype) i_26; pretmp_32 = pretmp_23 + 1; pretmp_33 = pretmp_32 * 4; pretmp_34 = pretmp_23 * 4; <bb 6>: # j_27 = PHI <j_19(7), 0(5)> _9 = p_6(D)->data; _13 = _9 + pretmp_33; _14 = *_13; _16 = _9 + pretmp_34; _17 = *_16; func (_17, _14); j_19 = j_27 + 1; if (count_8(D) > j_19) goto <bb 7>; else goto <bb 8>; <bb 7>: goto <bb 6>; <bb 8>: <bb 9>: i_20 = i_26 + 1; _7 = p_6(D)->count; if (_7 > i_20) goto <bb 10>; else goto <bb 11>; <bb 10>: goto <bb 4>; <bb 11>: return; There might be two issues that block IVOPT choosing the biv(i) for pretmp_33 and pretmp_34: 1) on some target (like ARM), "i << 2 + 4" can be done in one instruction, if the cost is same as simple shift or plus, then overall cost of biv(i) is lower than the two candidate iv sets. GCC doesn't do such check in get_computation_cost_at for now. 2) there is CSE opportunity between computation of pretmp_33 and pretmp_34, for example they can be computed as below: pretmp_33 = i << 2 pretmp_34 = pretmp_33 + 4 but GCC IVOPT is insensitive to such CSE opportunities between different iv uses. I guess this isn't easy because unless the two uses are very close in code (like this one), such CSE may avail to nothing. These kind tweaks on cost are tricky(and most probably has no overall benefit) because the cost IVOPT computed from RTL is far from precise to do such fine granularity tuning. Another point, as Zdenek pointed out, IVOPT doesn't know that pretmp_33/pretmp_34 are going to be used in memory accesses, which means some of address computation can be embedded by appropriate addressing mode. In other words, computation of pretmp_33/pretmp_34 shouldn't be honored when computing overall cost and choosing iv candidates set. Since "_9 + pretmp_33/pretmp_34" is not affine iv, the only way to handle this issue is to lower both memory accesses before IVOPT, into some code like below: <bb 5>: pretmp_23 = (sizetype) i_26; pretmp_32 = pretmp_23 + 1; <bb 6>: # j_27 = PHI <j_19(7), 0(5)> _9 = p_6(D)->data; _14 = MEM[_9 + pretmp_32 << 2]; _17 = MEM[_9 + pretmp_23 << 2]; func (_17, _14); j_19 = j_27 + 1; if (count_8(D) > j_19) goto <bb 7>; else goto <bb 8>; With this code, the iv uses are biv(i), pretmp_23(i_26) and pretmp_32(i_26+1), and IVOPT won't even add the annoying candidate.