On Thu, 6 May 2021 at 17:01, Richard Biener <rguent...@suse.de> wrote:
>
> On Thu, 6 May 2021, Prathamesh Kulkarni wrote:
>
> > On Thu, 6 May 2021 at 15:43, Richard Biener <rguent...@suse.de> wrote:
> > >
> > > On Thu, 6 May 2021, Prathamesh Kulkarni wrote:
> > >
> > > > Hi Richard,
> > > > I was just wondering if second (and higher) order hoistings may defeat
> > > > the "AVAIL_OUT in at least one successor heuristic" ?
> > > >
> > > > For eg:
> > > > bb2:
> > > > if (cond1) goto bb3 else goto bb4;
> > > >
> > > > bb3:
> > > > if (cond2) goto bb5 else goto bb6;
> > > >
> > > > bb5:
> > > > return x + y;
> > > >
> > > > bb6:
> > > > return x + y;
> > > >
> > > > bb4:
> > > > if (cond3) goto bb7 else goto bb8;
> > > >
> > > > bb7:
> > > > return x + y;
> > > >
> > > > bb8:
> > > > return x + y;
> > > >
> > > > In above CFG, IIUC, we will end up hoisting x + y in bb2 even if bb3
> > > > or bb4 don't originally
> > > > contain x + y since during first hoisting pass x + y will be hoisted
> > > > from bb5, bb6 into bb3,
> > > > and likewise from bb7, bb8 into bb4 and during second hoisting pass it
> > > > will hoist x + y into bb2, since x + y is now available in bb3 and
> > > > bb4.
> > >
> > > But that's intended - the logic is _supposed_ to do "everything
> > > at once", thus repeated runs of hoisting should not hoist more.
> > > Which is also why we no longer iterate hoist insertion.
> > >
> > > > This might become an issue if successive hoistings will end up
> > > > hoisting the expression "too far" resulting in long range hoisting ?
> > >
> > > I think this "long range hoisting" argument is a red herring...
> > >
> > > > Also if we're hoisting from post-dom block in which case the
> > > > expression may not be truly ANTIC, and eventually end up being
> > > > inserted into a successor block by successive hoisting passes.
> > >
> > > But this shouldn't happen - can you show a testcase where it does?
> > Well, I was thinking of this test-case:
> >
> > int f(int cond1, int cond2, int cond3, int x, int y)
> > {
> >   void f1();
> >   void f2(int);
> >   void f3();
> >
> >   if (cond1)
> >     f1 ();
> >   else
> >     {
> >       if (cond2)
> >         f2 (x + y);
> >       else
> >         f3 ();
> >     }
> >
> >   return x + y;
> > }
> >
> > The input to PRE pass is:
> >
> > f (int cond1, int cond2, int cond3, int x, int y)
> > {
> >   int _1;
> >   int _11;
> >
> >   <bb 2> [local count: 1073741824]:
> >   if (cond1_3(D) != 0)
> >     goto <bb 3>; [33.00%]
> >   else
> >     goto <bb 4>; [67.00%]
> >
> >   <bb 3> [local count: 354334800]:
> >   f1 ();
> >   goto <bb 7>; [100.00%]
> >
> >   <bb 4> [local count: 719407025]:
> >   if (cond2_4(D) != 0)
> >     goto <bb 5>; [50.00%]
> >   else
> >     goto <bb 6>; [50.00%]
> >
> >   <bb 5> [local count: 359703512]:
> >   _1 = x_7(D) + y_8(D);
> >   f2 (_1);
> >   goto <bb 7>; [100.00%]
> >
> >   <bb 6> [local count: 359703512]:
> >   f3 ();
> >
> >   <bb 7> [local count: 1073741824]:
> >   _11 = x_7(D) + y_8(D);
> >   return _11;
> > }
> >
> > pre dump shows:
> > Inserting expression in block 4 for code hoisting:
> > {plus_expr,x_7(D),y_8(D)} (0001)
> > Inserted _12 = x_7(D) + y_8(D);
> >  in predecessor 4 (0001)
> > Found partial redundancy for expression {plus_expr,x_7(D),y_8(D)} (0001)
> > Inserted _13 = x_7(D) + y_8(D);
> >  in predecessor 3 (0001)
>
> You are clearly looking at old GCC - GCC 11+ first do all PRE
Oops, sorry! I was looking at an old branch.
> insertion and only then do a _single_ hoist insert run.  But still
> it happens exactly as designed - there's a partial redundancy
> for the x + y add on BB7 predecessors so we insert in BB6 and BB3.
> That in turn leaves us with a perfect opportunity for hoisting
> where I don't see any reason do _not_ perform it since it saves
> us two copies of x + y.
>
> Trunk:
>
> Starting iteration 1
> Starting insert iteration 1
> Found partial redundancy for expression {plus_expr,x_7(D),y_8(D)} (0001)
> Inserted _12 = x_7(D) + y_8(D);
>  in predecessor 3 (0001)
> Inserted _13 = x_7(D) + y_8(D);
>  in predecessor 6 (0001)
> Created phi prephitmp_14 = PHI <_12(3), _1(5), _13(6)>
>  in block 7 (0001)
> Inserting expression in block 4 for code hoisting:
> {plus_expr,x_7(D),y_8(D)} (0001)
> Inserted _15 = x_7(D) + y_8(D);
>  in predecessor 4 (0001)
> Inserting expression in block 2 for code hoisting:
> {plus_expr,x_7(D),y_8(D)} (0001)
> Inserted _16 = x_7(D) + y_8(D);
>  in predecessor 2 (0001)
>
> (the hoist insert into BB4 is spurious)
>
> > Created phi prephitmp_14 = PHI <_13(3), _12(5), _12(6)>
> >  in block 7 (0001)
> > Starting insert iteration 2
> > Inserting expression in block 2 for code hoisting:
> > {plus_expr,x_7(D),y_8(D)} (0001)
> > Inserted _15 = x_7(D) + y_8(D);
> >  in predecessor 2 (0001)
> >
> > So IIUC, during first insert iteration, it is hoisting x + y into bb4
> > and PRE inserts x + y into bb3.
> > And during second iteration, it hoists x + y into bb2.
> > So we are effectively hoisting x + y from bb5, bb7 into bb2, which
> > doesn't seem to benefit other
> > two paths (bb2->bb3->bb7 and bb2->bb4->bb6->bb7) since they don't contain x 
> > + y.
> > I am not sure if we should be hoisting x + y into bb2 for this case ?
>
> We should.  We're optimizing the partially redundant compute in BB5
> and the insertion point with the least amount of code generated is
> in BB2.
Right, thanks for the clarification and sorry for the noise.

Regards,
Prathamesh
>
> Richard.

Reply via email to