On December 21, 2020 10:03:43 AM GMT+01:00, Xiong Hu Luo <luo...@linux.ibm.com> 
wrote:
>Here comes another case that requires run a pass once more, as this is
>not the common suggested direction to solve problems, not quite sure
>whether it is still a reasonble fix here.  Source code is something
>like:
>
>ref = ip + *hslot;
>while (ip < in_end - 2) {
>  unsigned int len = 2;
>  len++;
>    for ()   {
>      do len++;
>      while (len < maxlen && ref[len] == ip[len]); //sink code here.
>      break;
>    }
>  len -= 2;
>  ip++;
>  ip += len + 1;
>  if (ip >= in_end - 2)
>    break;
>}
>
>Before ivopts, the gimple for inner while loop is xxx.c.172t.slp1:
>
>  <bb 31> [local count: 75120046]:
>  # len_160 = PHI <len_161(30), len_189(58)>
>  len_189 = len_160 + 1;
>  _423 = (sizetype) len_189;
>  _424 = ip_229 + _423;
>  if (maxlen_186 > len_189)
>    goto <bb 32>; [94.50%]
>  else
>    goto <bb 33>; [5.50%]
>
>  <bb 32> [local count: 70988443]:
>  _84 = *_424;
>  _86 = ref_182 + _423;
>  _87 = *_86;
>  if (_84 == _87)
>    goto <bb 58>; [94.50%]
>  else
>    goto <bb 33>; [5.50%]
>
>  <bb 58> [local count: 67084079]:
>  goto <bb 31>; [100.00%]
>
>  <bb 33> [local count: 14847855]:
>  # len_263 = PHI <len_160(32), len_160(31)>
>  # _262 = PHI <_423(32), _423(31)>
>  # _264 = PHI <_424(32), _424(31)>
>  len_190 = len_263 + 4294967295;
>  if (len_190 <= 6)
>    goto <bb 34>; [0.00%]
>  else
>    goto <bb 36>; [100.00%]
>
>Then in ivopts, instructions are updated to xxx.c.174t.ivopts:
>
>  <bb 31> [local count: 75120046]:
>  # ivtmp.30_29 = PHI <ivtmp.30_32(30), ivtmp.30_31(58)>
>  _34 = (unsigned int) ivtmp.30_29;
>  len_160 = _34 + 4294967295;
>  _423 = ivtmp.30_29;
>  _35 = (unsigned long) ip_229;
>  _420 = ivtmp.30_29 + _35;
>  _419 = (uint8_t *) _420;
>  _424 = _419;
>  len_418 = (unsigned int) ivtmp.30_29;
>  if (maxlen_186 > len_418)
>    goto <bb 32>; [94.50%]
>  else
>    goto <bb 33>; [5.50%]
>
>  <bb 32> [local count: 70988443]:
>  _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_29 * 1];
>  ivtmp.30_31 = ivtmp.30_29 + 1;
>  _417 = ref_182 + 18446744073709551615;
>  _87 = MEM[(uint8_t *)_417 + ivtmp.30_31 * 1];
>  if (_84 == _87)
>    goto <bb 58>; [94.50%]
>  else
>    goto <bb 33>; [5.50%]
>
>  <bb 58> [local count: 67084079]:
>  goto <bb 31>; [100.00%]
>
>  <bb 33> [local count: 14847855]:
>  # len_263 = PHI <len_160(32), len_160(31)>
>  # _262 = PHI <_423(32), _423(31)>
>  # _264 = PHI <_424(32), _424(31)>
>  len_190 = len_263 + 4294967295;
>  if (len_190 <= 6)
>    goto <bb 34>; [0.00%]
>  else
>    goto <bb 36>; [100.00%]
>
>Some instructions in BB 31 are not used in the loop and could be sinked
>out of loop to reduce the computation, but they are not sinked
>throughout all passes later.  Run the sink_code pass once more at least
>after fre5 could improve this typical case performance 23% due to few
>instructions exausted in loop.
>xxx.c.209t.sink2:
>
>Sinking _419 = (uint8_t *) _420;
> from bb 31 to bb 89
>Sinking _420 = ivtmp.30_29 + _35;
> from bb 31 to bb 89
>Sinking _35 = (unsigned long) ip_229;
> from bb 31 to bb 89
>Sinking len_160 = _34 + 4294967295;
> from bb 31 to bb 33
>
>I also tested the SPEC2017 performance on P8LE, 544.nab_r is improved
>by 2.43%, but no big changes to other cases, GEOMEAN is improved quite
>small with 0.25%.
>
>The reason why it should be run after fre5 is fre would do some phi
>optimization to expose the optimization.  The patch put it after
>pass_modref is due to my guess that some gimple optimizations like
>thread_jumps, dse, dce etc. could provide more opportunities for
>sinking code.  Not sure it is the correct place to put.  I also
>verified this issue exists in both X86 and ARM64.
>Any comments?  Thanks.

It definitely should be before uncprop (but context stops there). And yes, 
re-running passes isn't the very, very best thing to do without explaining it 
cannot be done in other ways. Not for late stage 3 anyway. 

Richard. 

>---
> gcc/passes.def      | 1 +
> gcc/tree-ssa-sink.c | 1 +
> 2 files changed, 2 insertions(+)
>
>diff --git a/gcc/passes.def b/gcc/passes.def
>index 21b2e2af0f7..69106615729 100644
>--- a/gcc/passes.def
>+++ b/gcc/passes.def
>@@ -355,6 +355,7 @@ along with GCC; see the file COPYING3.  If not see
>       NEXT_PASS (pass_uncprop);
>       NEXT_PASS (pass_local_pure_const);
>       NEXT_PASS (pass_modref);
>+      NEXT_PASS (pass_sink_code);
>   POP_INSERT_PASSES ()
>   NEXT_PASS (pass_all_optimizations_g);
>   PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations_g)
>diff --git a/gcc/tree-ssa-sink.c b/gcc/tree-ssa-sink.c
>index b0abf4147d6..824659f3919 100644
>--- a/gcc/tree-ssa-sink.c
>+++ b/gcc/tree-ssa-sink.c
>@@ -819,6 +819,7 @@ public:
>   /* opt_pass methods: */
>   virtual bool gate (function *) { return flag_tree_sink != 0; }
>   virtual unsigned int execute (function *);
>+  opt_pass *clone (void) { return new pass_sink_code (m_ctxt); }
> 
> }; // class pass_sink_code
> 

Reply via email to