On Thu, 6 May 2021, Prathamesh Kulkarni wrote: > On Thu, 6 May 2021 at 15:43, Richard Biener <rguent...@suse.de> wrote: > > > > On Thu, 6 May 2021, Prathamesh Kulkarni wrote: > > > > > Hi Richard, > > > I was just wondering if second (and higher) order hoistings may defeat > > > the "AVAIL_OUT in at least one successor heuristic" ? > > > > > > For eg: > > > bb2: > > > if (cond1) goto bb3 else goto bb4; > > > > > > bb3: > > > if (cond2) goto bb5 else goto bb6; > > > > > > bb5: > > > return x + y; > > > > > > bb6: > > > return x + y; > > > > > > bb4: > > > if (cond3) goto bb7 else goto bb8; > > > > > > bb7: > > > return x + y; > > > > > > bb8: > > > return x + y; > > > > > > In above CFG, IIUC, we will end up hoisting x + y in bb2 even if bb3 > > > or bb4 don't originally > > > contain x + y since during first hoisting pass x + y will be hoisted > > > from bb5, bb6 into bb3, > > > and likewise from bb7, bb8 into bb4 and during second hoisting pass it > > > will hoist x + y into bb2, since x + y is now available in bb3 and > > > bb4. > > > > But that's intended - the logic is _supposed_ to do "everything > > at once", thus repeated runs of hoisting should not hoist more. > > Which is also why we no longer iterate hoist insertion. > > > > > This might become an issue if successive hoistings will end up > > > hoisting the expression "too far" resulting in long range hoisting ? > > > > I think this "long range hoisting" argument is a red herring... > > > > > Also if we're hoisting from post-dom block in which case the > > > expression may not be truly ANTIC, and eventually end up being > > > inserted into a successor block by successive hoisting passes. > > > > But this shouldn't happen - can you show a testcase where it does? > Well, I was thinking of this test-case: > > int f(int cond1, int cond2, int cond3, int x, int y) > { > void f1(); > void f2(int); > void f3(); > > if (cond1) > f1 (); > else > { > if (cond2) > f2 (x + y); > else > f3 (); > } > > return x + y; > } > > The input to PRE pass is: > > f (int cond1, int cond2, int cond3, int x, int y) > { > int _1; > int _11; > > <bb 2> [local count: 1073741824]: > if (cond1_3(D) != 0) > goto <bb 3>; [33.00%] > else > goto <bb 4>; [67.00%] > > <bb 3> [local count: 354334800]: > f1 (); > goto <bb 7>; [100.00%] > > <bb 4> [local count: 719407025]: > if (cond2_4(D) != 0) > goto <bb 5>; [50.00%] > else > goto <bb 6>; [50.00%] > > <bb 5> [local count: 359703512]: > _1 = x_7(D) + y_8(D); > f2 (_1); > goto <bb 7>; [100.00%] > > <bb 6> [local count: 359703512]: > f3 (); > > <bb 7> [local count: 1073741824]: > _11 = x_7(D) + y_8(D); > return _11; > } > > pre dump shows: > Inserting expression in block 4 for code hoisting: > {plus_expr,x_7(D),y_8(D)} (0001) > Inserted _12 = x_7(D) + y_8(D); > in predecessor 4 (0001) > Found partial redundancy for expression {plus_expr,x_7(D),y_8(D)} (0001) > Inserted _13 = x_7(D) + y_8(D); > in predecessor 3 (0001)
You are clearly looking at old GCC - GCC 11+ first do all PRE insertion and only then do a _single_ hoist insert run. But still it happens exactly as designed - there's a partial redundancy for the x + y add on BB7 predecessors so we insert in BB6 and BB3. That in turn leaves us with a perfect opportunity for hoisting where I don't see any reason do _not_ perform it since it saves us two copies of x + y. Trunk: Starting iteration 1 Starting insert iteration 1 Found partial redundancy for expression {plus_expr,x_7(D),y_8(D)} (0001) Inserted _12 = x_7(D) + y_8(D); in predecessor 3 (0001) Inserted _13 = x_7(D) + y_8(D); in predecessor 6 (0001) Created phi prephitmp_14 = PHI <_12(3), _1(5), _13(6)> in block 7 (0001) Inserting expression in block 4 for code hoisting: {plus_expr,x_7(D),y_8(D)} (0001) Inserted _15 = x_7(D) + y_8(D); in predecessor 4 (0001) Inserting expression in block 2 for code hoisting: {plus_expr,x_7(D),y_8(D)} (0001) Inserted _16 = x_7(D) + y_8(D); in predecessor 2 (0001) (the hoist insert into BB4 is spurious) > Created phi prephitmp_14 = PHI <_13(3), _12(5), _12(6)> > in block 7 (0001) > Starting insert iteration 2 > Inserting expression in block 2 for code hoisting: > {plus_expr,x_7(D),y_8(D)} (0001) > Inserted _15 = x_7(D) + y_8(D); > in predecessor 2 (0001) > > So IIUC, during first insert iteration, it is hoisting x + y into bb4 > and PRE inserts x + y into bb3. > And during second iteration, it hoists x + y into bb2. > So we are effectively hoisting x + y from bb5, bb7 into bb2, which > doesn't seem to benefit other > two paths (bb2->bb3->bb7 and bb2->bb4->bb6->bb7) since they don't contain x + > y. > I am not sure if we should be hoisting x + y into bb2 for this case ? We should. We're optimizing the partially redundant compute in BB5 and the insertion point with the least amount of code generated is in BB2. Richard.