On Thu, 6 May 2021 at 17:01, Richard Biener <rguent...@suse.de> wrote: > > On Thu, 6 May 2021, Prathamesh Kulkarni wrote: > > > On Thu, 6 May 2021 at 15:43, Richard Biener <rguent...@suse.de> wrote: > > > > > > On Thu, 6 May 2021, Prathamesh Kulkarni wrote: > > > > > > > Hi Richard, > > > > I was just wondering if second (and higher) order hoistings may defeat > > > > the "AVAIL_OUT in at least one successor heuristic" ? > > > > > > > > For eg: > > > > bb2: > > > > if (cond1) goto bb3 else goto bb4; > > > > > > > > bb3: > > > > if (cond2) goto bb5 else goto bb6; > > > > > > > > bb5: > > > > return x + y; > > > > > > > > bb6: > > > > return x + y; > > > > > > > > bb4: > > > > if (cond3) goto bb7 else goto bb8; > > > > > > > > bb7: > > > > return x + y; > > > > > > > > bb8: > > > > return x + y; > > > > > > > > In above CFG, IIUC, we will end up hoisting x + y in bb2 even if bb3 > > > > or bb4 don't originally > > > > contain x + y since during first hoisting pass x + y will be hoisted > > > > from bb5, bb6 into bb3, > > > > and likewise from bb7, bb8 into bb4 and during second hoisting pass it > > > > will hoist x + y into bb2, since x + y is now available in bb3 and > > > > bb4. > > > > > > But that's intended - the logic is _supposed_ to do "everything > > > at once", thus repeated runs of hoisting should not hoist more. > > > Which is also why we no longer iterate hoist insertion. > > > > > > > This might become an issue if successive hoistings will end up > > > > hoisting the expression "too far" resulting in long range hoisting ? > > > > > > I think this "long range hoisting" argument is a red herring... > > > > > > > Also if we're hoisting from post-dom block in which case the > > > > expression may not be truly ANTIC, and eventually end up being > > > > inserted into a successor block by successive hoisting passes. > > > > > > But this shouldn't happen - can you show a testcase where it does? > > Well, I was thinking of this test-case: > > > > int f(int cond1, int cond2, int cond3, int x, int y) > > { > > void f1(); > > void f2(int); > > void f3(); > > > > if (cond1) > > f1 (); > > else > > { > > if (cond2) > > f2 (x + y); > > else > > f3 (); > > } > > > > return x + y; > > } > > > > The input to PRE pass is: > > > > f (int cond1, int cond2, int cond3, int x, int y) > > { > > int _1; > > int _11; > > > > <bb 2> [local count: 1073741824]: > > if (cond1_3(D) != 0) > > goto <bb 3>; [33.00%] > > else > > goto <bb 4>; [67.00%] > > > > <bb 3> [local count: 354334800]: > > f1 (); > > goto <bb 7>; [100.00%] > > > > <bb 4> [local count: 719407025]: > > if (cond2_4(D) != 0) > > goto <bb 5>; [50.00%] > > else > > goto <bb 6>; [50.00%] > > > > <bb 5> [local count: 359703512]: > > _1 = x_7(D) + y_8(D); > > f2 (_1); > > goto <bb 7>; [100.00%] > > > > <bb 6> [local count: 359703512]: > > f3 (); > > > > <bb 7> [local count: 1073741824]: > > _11 = x_7(D) + y_8(D); > > return _11; > > } > > > > pre dump shows: > > Inserting expression in block 4 for code hoisting: > > {plus_expr,x_7(D),y_8(D)} (0001) > > Inserted _12 = x_7(D) + y_8(D); > > in predecessor 4 (0001) > > Found partial redundancy for expression {plus_expr,x_7(D),y_8(D)} (0001) > > Inserted _13 = x_7(D) + y_8(D); > > in predecessor 3 (0001) > > You are clearly looking at old GCC - GCC 11+ first do all PRE Oops, sorry! I was looking at an old branch. > insertion and only then do a _single_ hoist insert run. But still > it happens exactly as designed - there's a partial redundancy > for the x + y add on BB7 predecessors so we insert in BB6 and BB3. > That in turn leaves us with a perfect opportunity for hoisting > where I don't see any reason do _not_ perform it since it saves > us two copies of x + y. > > Trunk: > > Starting iteration 1 > Starting insert iteration 1 > Found partial redundancy for expression {plus_expr,x_7(D),y_8(D)} (0001) > Inserted _12 = x_7(D) + y_8(D); > in predecessor 3 (0001) > Inserted _13 = x_7(D) + y_8(D); > in predecessor 6 (0001) > Created phi prephitmp_14 = PHI <_12(3), _1(5), _13(6)> > in block 7 (0001) > Inserting expression in block 4 for code hoisting: > {plus_expr,x_7(D),y_8(D)} (0001) > Inserted _15 = x_7(D) + y_8(D); > in predecessor 4 (0001) > Inserting expression in block 2 for code hoisting: > {plus_expr,x_7(D),y_8(D)} (0001) > Inserted _16 = x_7(D) + y_8(D); > in predecessor 2 (0001) > > (the hoist insert into BB4 is spurious) > > > Created phi prephitmp_14 = PHI <_13(3), _12(5), _12(6)> > > in block 7 (0001) > > Starting insert iteration 2 > > Inserting expression in block 2 for code hoisting: > > {plus_expr,x_7(D),y_8(D)} (0001) > > Inserted _15 = x_7(D) + y_8(D); > > in predecessor 2 (0001) > > > > So IIUC, during first insert iteration, it is hoisting x + y into bb4 > > and PRE inserts x + y into bb3. > > And during second iteration, it hoists x + y into bb2. > > So we are effectively hoisting x + y from bb5, bb7 into bb2, which > > doesn't seem to benefit other > > two paths (bb2->bb3->bb7 and bb2->bb4->bb6->bb7) since they don't contain x > > + y. > > I am not sure if we should be hoisting x + y into bb2 for this case ? > > We should. We're optimizing the partially redundant compute in BB5 > and the insertion point with the least amount of code generated is > in BB2. Right, thanks for the clarification and sorry for the noise.
Regards, Prathamesh > > Richard.