On Thu, 6 May 2021, Prathamesh Kulkarni wrote:

> On Thu, 6 May 2021 at 15:43, Richard Biener <rguent...@suse.de> wrote:
> >
> > On Thu, 6 May 2021, Prathamesh Kulkarni wrote:
> >
> > > Hi Richard,
> > > I was just wondering if second (and higher) order hoistings may defeat
> > > the "AVAIL_OUT in at least one successor heuristic" ?
> > >
> > > For eg:
> > > bb2:
> > > if (cond1) goto bb3 else goto bb4;
> > >
> > > bb3:
> > > if (cond2) goto bb5 else goto bb6;
> > >
> > > bb5:
> > > return x + y;
> > >
> > > bb6:
> > > return x + y;
> > >
> > > bb4:
> > > if (cond3) goto bb7 else goto bb8;
> > >
> > > bb7:
> > > return x + y;
> > >
> > > bb8:
> > > return x + y;
> > >
> > > In above CFG, IIUC, we will end up hoisting x + y in bb2 even if bb3
> > > or bb4 don't originally
> > > contain x + y since during first hoisting pass x + y will be hoisted
> > > from bb5, bb6 into bb3,
> > > and likewise from bb7, bb8 into bb4 and during second hoisting pass it
> > > will hoist x + y into bb2, since x + y is now available in bb3 and
> > > bb4.
> >
> > But that's intended - the logic is _supposed_ to do "everything
> > at once", thus repeated runs of hoisting should not hoist more.
> > Which is also why we no longer iterate hoist insertion.
> >
> > > This might become an issue if successive hoistings will end up
> > > hoisting the expression "too far" resulting in long range hoisting ?
> >
> > I think this "long range hoisting" argument is a red herring...
> >
> > > Also if we're hoisting from post-dom block in which case the
> > > expression may not be truly ANTIC, and eventually end up being
> > > inserted into a successor block by successive hoisting passes.
> >
> > But this shouldn't happen - can you show a testcase where it does?
> Well, I was thinking of this test-case:
> 
> int f(int cond1, int cond2, int cond3, int x, int y)
> {
>   void f1();
>   void f2(int);
>   void f3();
> 
>   if (cond1)
>     f1 ();
>   else
>     {
>       if (cond2)
>         f2 (x + y);
>       else
>         f3 ();
>     }
> 
>   return x + y;
> }
> 
> The input to PRE pass is:
> 
> f (int cond1, int cond2, int cond3, int x, int y)
> {
>   int _1;
>   int _11;
> 
>   <bb 2> [local count: 1073741824]:
>   if (cond1_3(D) != 0)
>     goto <bb 3>; [33.00%]
>   else
>     goto <bb 4>; [67.00%]
> 
>   <bb 3> [local count: 354334800]:
>   f1 ();
>   goto <bb 7>; [100.00%]
> 
>   <bb 4> [local count: 719407025]:
>   if (cond2_4(D) != 0)
>     goto <bb 5>; [50.00%]
>   else
>     goto <bb 6>; [50.00%]
> 
>   <bb 5> [local count: 359703512]:
>   _1 = x_7(D) + y_8(D);
>   f2 (_1);
>   goto <bb 7>; [100.00%]
> 
>   <bb 6> [local count: 359703512]:
>   f3 ();
> 
>   <bb 7> [local count: 1073741824]:
>   _11 = x_7(D) + y_8(D);
>   return _11;
> }
> 
> pre dump shows:
> Inserting expression in block 4 for code hoisting:
> {plus_expr,x_7(D),y_8(D)} (0001)
> Inserted _12 = x_7(D) + y_8(D);
>  in predecessor 4 (0001)
> Found partial redundancy for expression {plus_expr,x_7(D),y_8(D)} (0001)
> Inserted _13 = x_7(D) + y_8(D);
>  in predecessor 3 (0001)

You are clearly looking at old GCC - GCC 11+ first do all PRE
insertion and only then do a _single_ hoist insert run.  But still
it happens exactly as designed - there's a partial redundancy
for the x + y add on BB7 predecessors so we insert in BB6 and BB3.
That in turn leaves us with a perfect opportunity for hoisting
where I don't see any reason do _not_ perform it since it saves
us two copies of x + y.

Trunk:

Starting iteration 1
Starting insert iteration 1
Found partial redundancy for expression {plus_expr,x_7(D),y_8(D)} (0001)
Inserted _12 = x_7(D) + y_8(D);
 in predecessor 3 (0001)
Inserted _13 = x_7(D) + y_8(D);
 in predecessor 6 (0001)
Created phi prephitmp_14 = PHI <_12(3), _1(5), _13(6)>
 in block 7 (0001)
Inserting expression in block 4 for code hoisting: 
{plus_expr,x_7(D),y_8(D)} (0001)
Inserted _15 = x_7(D) + y_8(D);
 in predecessor 4 (0001)
Inserting expression in block 2 for code hoisting: 
{plus_expr,x_7(D),y_8(D)} (0001)
Inserted _16 = x_7(D) + y_8(D);
 in predecessor 2 (0001)

(the hoist insert into BB4 is spurious)

> Created phi prephitmp_14 = PHI <_13(3), _12(5), _12(6)>
>  in block 7 (0001)
> Starting insert iteration 2
> Inserting expression in block 2 for code hoisting:
> {plus_expr,x_7(D),y_8(D)} (0001)
> Inserted _15 = x_7(D) + y_8(D);
>  in predecessor 2 (0001)
> 
> So IIUC, during first insert iteration, it is hoisting x + y into bb4
> and PRE inserts x + y into bb3.
> And during second iteration, it hoists x + y into bb2.
> So we are effectively hoisting x + y from bb5, bb7 into bb2, which
> doesn't seem to benefit other
> two paths (bb2->bb3->bb7 and bb2->bb4->bb6->bb7) since they don't contain x + 
> y.
> I am not sure if we should be hoisting x + y into bb2 for this case ?

We should.  We're optimizing the partially redundant compute in BB5
and the insertion point with the least amount of code generated is
in BB2.

Richard.

Reply via email to