post-increment optimization

law at redhat dot com Wed, 20 Dec 2017 09:35:05 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81611


Jeffrey A. Law <law at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at redhat dot com

--- Comment #2 from Jeffrey A. Law <law at redhat dot com> ---
So we get good code just prior to this:

Author: rguenth <rguenth@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 12:52:43 2015 +0000

    2015-04-21  Richard Biener  <rguent...@suse.de>

            PR tree-optimization/65650
            * tree-ssa-ccp.c (valid_lattice_transition): Allow lattice
            transitions involving copies.
            (set_lattice_value): Adjust for copy lattice state.
[ ... ]

That change results in propagation of a copy (no surprise there).  This results
in IVopts making some different choices.    Prior to the change the loop looks
like this in the .optimized dump:

 # x_1 = PHI <x_4(D)(2), x_13(3)>
  # ivtmp.7_16 = PHI <ivtmp.7_7(2), ivtmp.7_15(3)>
  str_2 = (char *) ivtmp.7_16;
  _9 = x_1 & 1;
  _10 = _9 + 48;
  _11 = (char) _10;
  MEM[base: str_2, offset: 0B] = _11;
  ivtmp.7_15 = ivtmp.7_16 + 1;
  x_13 = x_1 >> 1;
  if (x_13 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

Intuitively we can see the relationship between STR and IVTMP and the likely
post-inc opportunity at the memory reference and subsequent increment of IVTMP.

If we look at the loop after the referenced change we have:

  # x_1 = PHI <x_4(D)(2), x_13(3)>
  # str_2 = PHI <str_5(D)(2), str_8(3)>
  str_8 = str_2 + 1;
  _9 = x_1 & 1;
  _10 = _9 + 48;
  _11 = (char) _10;
  _16 = str_8 + 65535;
  MEM[base: _16, offset: 0B] = _11;
  x_13 = x_1 >> 1;
  if (x_13 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

So we no longer have the IV, just STR and it's a lot harder to recover the
auto-inc opportunity at the memory reference.  Anyway, that's the point where
it looks to me like things start to go off the rails.


If we walk forward to the trunk today and look at the .expand dump we have:

  # x_5 = PHI <x_8(D)(2), x_15(3)>
  # str_6 = PHI <str_9(D)(2), str_17(3)>
  _1 = x_5 & 1;
  _2 = _1 + 48;
  str_11 = str_6 + 1;
  _4 = (char) _2;
  _16 = str_6;
  MEM[base: _16, offset: 0B] = _4;
  x_13 = x_5 >> 1;
  x_15 = x_13;
  str_17 = str_11;
  if (x_5 > 1)
    goto <bb 3>; [85.00%]
  else
    goto <bb 4>; [15.00%]

;;   basic block 4, loop depth 0
;;    pred:       3
  MEM[(char *)str_6 + 1B] = 0;
  return;


I think part of the problem is that we need str_6 and str_11 -- they have
different values and conflict.  The two MEMs could potentially be rewritten  
in terms of str_11.  With the obvious copy-props we'd have something like this:

  # x_5 = PHI <x_8(D)(2), x_13(3)>
  # str_6 = PHI <str_9(D)(2), str_11(3)>
  _1 = x_5 & 1;
  _2 = _1 + 48;
  str_11 = str_6 + 1;
  _4 = (char) _2;
  MEM[base: str_11, offset: -1B] = _4;
  x_13 = x_5 >> 1;
  if (x_5 > 1)
    goto <bb 3>; [85.00%]
  else
    goto <bb 4>; [15.00%]

;;   basic block 4, loop depth 0
;;    pred:       3
  MEM[(char *)str_11, offset: 0B] = 0;
  return;


That ought to allow str_6 and str_11 to coalesce.  The question then becomes
can we recover the auto-inc -- I'm not sure the auto-inc code is good enough to
see it in that form.

Most importantly, while this BZ is filed against the AVR target, it seems to me
to be clearly a generic issue.

[Bug rtl-optimization/81611] [8 Regression] gcc un-learned loop / post-increment optimization

Reply via email to