172mgrid

rguenther at suse dot de Wed, 22 Mar 2017 05:58:20 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77498


--- Comment #12 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 22 Mar 2017, thopre01 at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77498
> 
> --- Comment #11 from Thomas Preud'homme <thopre01 at gcc dot gnu.org> ---
> (In reply to Thomas Preud'homme from comment #9)
> > Sadly I could not come up with a minimal testcase so far. What I can see
> > from the code is that tree code hoisting increases the live range of some
> > values which then translates into more spilling in reload.
> > 
> > As an approximation I'm wondering if the maximum distance (computer in
> > number of blocks traversed) from the definition to the use could be used to
> > limit when the optimization is applied when optimizing for speed.
> 
> I finally managed. The bug can be reproduced by building the following for
> arm-none-eabi with -S -O2 -mcpu=cortex-m7 and looking for the push in the
> resulting assembly code.
> 
> fn1() {
>   char *a;
>   char b;
>   for (; *a; a++) {
>     if (b)
>       a++;
>     fn2();
>   }
> }
> 
> With -O2: r3, r4, r5 and lr and pushed.
> With -O2 -fno-code-hoisting: r4 and lr are pushed only.
> 
> 
> Similarly for -mcpu=cortex-m0plus:
> 
> enum { ENUM1, ENUM2, ENUM3 } a;
> fn1() {
>   char *b;
>   for (; *b && a != ENUM2; b++)
>     switch (a) {
>       case ENUM1: a = ENUM3;
>     }
> }

But that's not caused by r239414 so please open a new bug for this.
(confirmed with a cross)

Transform:

  <bb 3> [85.00%]:
  # a_14 = PHI <a_10(8), a_5(D)(7)>
  if (b_7(D) != 0)
    goto <bb 4>; [50.00%]
  else
    goto <bb 10>; [50.00%]

  <bb 10> [42.50%]:
  goto <bb 5>; [100.00%]

  <bb 4> [42.50%]:
  a_8 = a_14 + 1;

  <bb 5> [85.00%]:
  # a_2 = PHI <a_14(10), a_8(4)>
  fn2 ();
  a_10 = a_2 + 1;

to

  <bb 3> [85.00%]:
  # a_14 = PHI <prephitmp_12(5), a_5(D)(2)>
  _4 = a_14 + 1;
  if (b_7(D) != 0)
    goto <bb 4>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 4> [42.50%]:
  _3 = _4 + 1;

  <bb 5> [85.00%]:
  # a_2 = PHI <a_14(3), _4(4)>
  # prephitmp_12 = PHI <_4(3), _3(4)>
  fn2 ();

that's because the hoisting (which itself isn't a problem) makes
a_2 + 1 partially redundant over the latch.  We see this issue
in related testcases where PRE can compute a constant for the
first iteration value of expressions and thus inserts IVs for
them.  So it's nothing new and a fix would hopefully fix those
cases as well.

[Bug tree-optimization/77498] [7 regression] Performance drop after r239414 on spec2000/172mgrid

Reply via email to