On Tue, 17 Aug 2021, Xionghu Luo wrote:

> Hi,
> 
> On 2021/8/16 19:46, Richard Biener wrote:
> > On Mon, 16 Aug 2021, Xiong Hu Luo wrote:
> > 
> >> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
> >> nested loops.  inn_loop is updated to inner loop, so it need be restored
> >> when exiting from innermost loop. With this patch, the store instruction
> >> in outer loop could also be moved out of outer loop by store motion.
> >> Any comments?  Thanks.
> > 
> >> gcc/ChangeLog:
> >>
> >>  * tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
> >>  inn_loop when exiting from innermost loop.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>    * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
> >> ---
> >>   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++
> >>   gcc/tree-ssa-loop-im.c                     |  6 +++++-
> >>   2 files changed, 29 insertions(+), 1 deletion(-)
> >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> >> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> >> new file mode 100644
> >> index 00000000000..097a5ee4a4b
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> >> @@ -0,0 +1,24 @@
> >> +/* PR/101293 */
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> >> +
> >> +struct X { int i; int j; int k;};
> >> +
> >> +void foo(struct X *x, int n, int l)
> >> +{
> >> +  for (int j = 0; j < l; j++)
> >> +    {
> >> +      for (int i = 0; i < n; ++i)
> >> +  {
> >> +    int *p = &x->j;
> >> +    int tem = *p;
> >> +    x->j += tem * i;
> >> +  }
> >> +      int *r = &x->k;
> >> +      int tem2 = *r;
> >> +      x->k += tem2 * j;
> >> +    }
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } }
> >> */
> >> +
> >> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> >> index b24bc64f2a7..5ca4738b20e 100644
> >> --- a/gcc/tree-ssa-loop-im.c
> >> +++ b/gcc/tree-ssa-loop-im.c
> >> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
> >> @@ contains_call)
> >>      if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >>        last = bb;
> >>   +          if (inn_loop != loop
> >> +        && flow_loop_nested_p (bb->loop_father, inn_loop))
> >> +      inn_loop = bb->loop_father;
> >> +
> > 
> > The comment says
> > 
> >                /* In a loop that is always entered we may proceed anyway.
> >                   But record that we entered it and stop once we leave it.
> > */
> >                inn_loop = bb->loop_father;
> > 
> > and your change would defeat that early return, no?
> 
> The issue is the search method exits too early when iterating the outer
> loop.  For example of a nested loop, loop 1 includes 5,8,3,10,4,9
> and loop2 includes 3,10.  Currently, it breaks when bb is 3 as bb 3
> doesn't dominate bb 9 of loop 1.  But actually, both bb 5 and bb 4 are
> ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4
> they won't be processed by store motion again.
> 
> 
>     5<----
>     |\   |
>     8 \  9
>     |  \ |
> --->3--->4
> |    | 
> 10---|
> 
> 
> SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this
> patch, it will continue search when meet bb 3 until bb 4, then last is updated
> to bb 4, it will break until exit edge is found at bb 4 by
> "if (!flow_bb_inside_loop_p (loop, e->dest))".  Then the followed loop code
> will
> set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5.
> 
> 
>      while (1)
>       {
>         SET_ALWAYS_EXECUTED_IN (last, loop);
>         if (last == loop->header)
>           break;
>         last = get_immediate_dominator (CDI_DOMINATORS, last);
>       }
> 
> After further discussion with Kewen, we found that the inn_loop variable is
> totally useless and could be removed.
> 
> 
> > 
> >>      if (bitmap_bit_p (contains_call, bb->index))
> >>        break;
> >>   
> >> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
> >> @@ contains_call)
> >>   
> >>      if (bb->loop_father->header == bb)
> >>        {
> >> -        if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >> +        if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch,
> >> bb))
> >>     break;
> > 
> > That's now a always false condition - a loops latch is always dominated
> > by its header.  The condition as written tries to verify whether the
> > loop is always entered - mind we visit all blocks, not only those
> > always executed.
> 
> Thanks for the catch!  I am afraid the piece of code should be removed since
> it stops
> search of potential ALWAYS EXECUTED bb after inner loop...

But the code says:

            /* In a loop that is always entered we may proceed anyway.
               But record that we entered it and stop once we leave it.
             */

and you do not remove this comment still it doesn't hold anymore
after your patch.  I don't say the current code is optimal - I just
say it does exactly what is documented.


> > 
> > In fact for your testcase the x->j ref is _not_ always executed
> > since the inner loop is conditional on n > 0.
> 
> Yes.  But I want to move x->k (not x->j) out of loop 1 when l > 0 in
> store-motion.

Sorry, that wasn't clear.  Yes, I agree the code fails to see always
executed blocks after inner loops.  But since the code simply walks
all blocks in a loop instead of greedily following edges it has
to do that since the inner loop might exit the outer loop as well,
sth your change wouldn't honor.  Consider

 while (--n)
  {
    do
      {
        if (do_exit)
          goto out;
      }
    while (1);
    p->x += 1;
  }
out:;

you'll see a CFG where the inner loop exits the outer loop as well.

So I'm saying to improve the code you'll likely have to do more.
And add a testcase like the following

void __attribute__((noipa))
foo (int n, int m, int f, int *p, int *q)
{
 while (--n)
  {
    do
      {
        *q += 1;
        if (f)
          goto out;
      }
    while (--m);
    *p += 1;
  }
out:;
}

int main()
{
  int i = 0;
  foo (10, 10, 1, (void *)0, &i);
  if (i != 1)
    __builtin_abort ();
  return 0;
}

Richard.

> Attached the diff file without and with my patch to show the extra
> optimization.
> 
> x->j is already moved out of loop 2 on master code.
> If change n and l to constant numbers like 100, master code could also do 2
> store
> motions as expected. The edge from bb 5 to bb 4 doesn't exist now, so bb 4, bb
> 3
> and bb 5 are ALWAYS EXECUTED for loop 1.
> 
> 
> struct X { int i; int j; int k;};
> 
> void foo(struct X *x, int n, int l)
> {
>  for (int j = 0; j < l; j++) // loop 1
>    {
>      for (int i = 0; i < n; ++i)  // loop 2
>        {
>          int *p = &x->j;
>          int tem = *p;
>          x->j += tem * i;
>        }
>      int *r = &x->k;
>      int tem2 = *r;
>      x->k += tem2 * j;
>    }
> }
> 
> > 
> > Richard.
> > 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to