https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77498
--- Comment #12 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 22 Mar 2017, thopre01 at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77498 > > --- Comment #11 from Thomas Preud'homme <thopre01 at gcc dot gnu.org> --- > (In reply to Thomas Preud'homme from comment #9) > > Sadly I could not come up with a minimal testcase so far. What I can see > > from the code is that tree code hoisting increases the live range of some > > values which then translates into more spilling in reload. > > > > As an approximation I'm wondering if the maximum distance (computer in > > number of blocks traversed) from the definition to the use could be used to > > limit when the optimization is applied when optimizing for speed. > > I finally managed. The bug can be reproduced by building the following for > arm-none-eabi with -S -O2 -mcpu=cortex-m7 and looking for the push in the > resulting assembly code. > > fn1() { > char *a; > char b; > for (; *a; a++) { > if (b) > a++; > fn2(); > } > } > > With -O2: r3, r4, r5 and lr and pushed. > With -O2 -fno-code-hoisting: r4 and lr are pushed only. > > > Similarly for -mcpu=cortex-m0plus: > > enum { ENUM1, ENUM2, ENUM3 } a; > fn1() { > char *b; > for (; *b && a != ENUM2; b++) > switch (a) { > case ENUM1: a = ENUM3; > } > } But that's not caused by r239414 so please open a new bug for this. (confirmed with a cross) Transform: <bb 3> [85.00%]: # a_14 = PHI <a_10(8), a_5(D)(7)> if (b_7(D) != 0) goto <bb 4>; [50.00%] else goto <bb 10>; [50.00%] <bb 10> [42.50%]: goto <bb 5>; [100.00%] <bb 4> [42.50%]: a_8 = a_14 + 1; <bb 5> [85.00%]: # a_2 = PHI <a_14(10), a_8(4)> fn2 (); a_10 = a_2 + 1; to <bb 3> [85.00%]: # a_14 = PHI <prephitmp_12(5), a_5(D)(2)> _4 = a_14 + 1; if (b_7(D) != 0) goto <bb 4>; [50.00%] else goto <bb 5>; [50.00%] <bb 4> [42.50%]: _3 = _4 + 1; <bb 5> [85.00%]: # a_2 = PHI <a_14(3), _4(4)> # prephitmp_12 = PHI <_4(3), _3(4)> fn2 (); that's because the hoisting (which itself isn't a problem) makes a_2 + 1 partially redundant over the latch. We see this issue in related testcases where PRE can compute a constant for the first iteration value of expressions and thus inserts IVs for them. So it's nothing new and a fix would hopefully fix those cases as well.