On Tue, Feb 28, 2012 at 9:33 AM, Jiangning Liu <jiangning....@arm.com> wrote: > > >> -----Original Message----- >> From: Jiangning Liu >> Sent: Tuesday, February 28, 2012 4:07 PM >> To: Jiangning Liu; 'William J. Schmidt' >> Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org >> Subject: RE: A question about redundant PHI expression stmt >> >> >> >> > -----Original Message----- >> > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf >> Of >> > Jiangning Liu >> > Sent: Tuesday, February 28, 2012 11:19 AM >> > To: 'William J. Schmidt' >> > Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org >> > Subject: RE: A question about redundant PHI expression stmt >> > >> > >> > >> > > -----Original Message----- >> > > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On >> Behalf >> > Of >> > > William J. Schmidt >> > > Sent: Monday, February 27, 2012 11:32 PM >> > > To: Jiangning Liu >> > > Cc: gcc@gcc.gnu.org; wschm...@gcc.gnu.org >> > > Subject: Re: A question about redundant PHI expression stmt >> > > >> > > On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote: >> > > > Hi, >> > > > >> > > > For the small case below, there are some redundant PHI expression >> > > stmt >> > > > generated, and finally cause back-end generates redundant copy >> > > instructions >> > > > due to some reasons around IRA. >> > > > >> > > > int *l, *r, *g; >> > > > void test_func(int n) >> > > > { >> > > > int i; >> > > > static int j; >> > > > static int pos, direction, direction_pre; >> > > > >> > > > pos = 0; >> > > > direction = 1; >> > > > >> > > > for ( i = 0; i < n; i++ ) >> > > > { >> > > > direction_pre = direction; >> > > > >> > > > for ( j = 0; j <= 400; j++ ) >> > > > { >> > > > >> > > > if ( g[pos] == 200 ) >> > > > break; >> > > > >> > > > if ( direction == 0 ) >> > > > pos = l[pos]; >> > > > else >> > > > pos = r[pos]; >> > > > >> > > > if ( pos == -1 ) >> > > > { >> > > > if ( direction_pre != direction ) >> > > > break; >> > > > >> > > > pos = 0; >> > > > direction = !direction; >> > > > } >> > > > >> > > > } >> > > > >> > > > f(g[pos]); >> > > > } >> > > > } >> > > > >> > > > I know PR39976 has something to do with this case, and check-in >> > > r182140 >> > > > caused big degradation on the real benchmark, but I'm still >> > confusing >> > > around >> > > > this issue. >> > > > >> > > > The procedure is like this, >> > > > >> > > > Loop store motion generates code below, >> > > > >> > > > <bb 6>: >> > > > D.4084_17 = l.4_13 + D.4077_70; >> > > > pos.5_18 = *D.4084_17; >> > > > pos_lsm.34_103 = pos.5_18; >> > > > goto <bb 8>; >> > > > >> > > > <bb 7>: >> > > > D.4088_23 = r.6_19 + D.4077_70; >> > > > pos.7_24 = *D.4088_23; >> > > > pos_lsm.34_104 = pos.7_24; >> > > > >> > > > <bb 8>: >> > > > # prephitmp.29_89 = PHI <pos.5_18(6), pos.7_24(7)> >> > > > # pos_lsm.34_53 = PHI <pos_lsm.34_103(6), pos_lsm.34_104(7)> >> > > > pos.2_25 = prephitmp.29_89; >> > > > if (pos.2_25 == -1) >> > > > goto <bb 9>; >> > > > else >> > > > goto <bb 20>; >> > > > >> > > > And then, copy propagation transforms block 8 it into >> > > > >> > > > <bb 8>: >> > > > # prephitmp.29_89 = PHI <pos.5_18(11), pos.7_24(12)> >> > > > # pos_lsm.34_53 = PHI <pos.5_18(11), pos.7_24(12)> >> > > > ... >> > > > >> > > > And then, these two duplicated PHI stmts have been kept until >> > expand >> > > pass. >> > > > >> > > > In dom2, a stmt like below >> > > > >> > > > # pos_lsm.34_54 = PHI <pos_lsm.34_53(13), 0(16)> >> > > > >> > > > is transformed into >> > > > >> > > > # pos_lsm.34_54 = PHI <prephitmp.29_89(13), 0(16)> >> > > > >> > > > just because they are equal. >> > > > >> > > > Unfortunately, this transformation changed back-end behavior to >> > > generate >> > > > redundant copy instructions and hurt performance. This case is >> from >> > a >> > > real >> > > > benchmark and hurt performance a lot. >> > > > >> > > > Does the fix in r182140 intend to totally clean up this kind of >> > > redundancy? >> > > > Where should we get it fixed? >> > > > >> > > >> > > Hi, sorry not to have responded sooner -- I just now got some time >> to >> > > look at this. >> > > >> > > I compiled your code with -O3 for revisions 182139 and 182140 of >> GCC >> > > trunk, and found no difference between the code produced by the >> > middle >> > > end for the two versions. So something else has apparently come >> > along >> > > since then that helped produce the problematic code generation for >> > you. >> > > Either that or I need to use different compile flags; you didn't >> > > specify >> > > what you used. >> > > >> > > The fix in r182140 does just what you saw in dom2: identifies >> > > duplicate >> > > PHIs in the same block and ensures only one of them is used. This >> > > actually avoids inserting extra blocks during expand in certain >> loop >> > > cases. I am not sure why you are getting redundant copies as a >> > result, >> > > but it sounds from your comments like IRA didn't coalesce a >> register >> > > copy or something like that. You may want to bisect revisions on >> the >> > > trunk to see where your bad code generation started to occur to get >> a >> > > better handle on what happened. >> > > >> > > As Richard said, the dom pass is likely to be removed someday, >> > whenever >> > > someone can get around to it. My redundant-phi band-aid in dom >> would >> > > go >> > > away then as well, but presumably an extra pass of PRE would >> replace >> > it, >> > > and redundant PHIs would still be removed. >> > > >> > >> > Bill, >> > >> > Thanks a lot for your explanation! >> > >> > The bug could be exposed with -O2 if you apply the check-in r184435 >> > made by Richard. >> > >> > BTW, at present is there anybody working on the NEW pass replacing >> dom? >> > If no, in short term, it would be good to still get it fixed just >> > because it is hurting benchmark. If Yes, may I know what that NEW >> pass >> > will be focusing on? >> > >> >> With dump enabled, I do see a PHI copy is identified by >> tree_ssa_dominator_optimize as below, >> >> Optimizing block #12 >> >> LKUP STMT prephitmp.29_89 = PHI <pos.5_18, pos.7_24> >> prephitmp.29_89 = PHI <pos.5_18(10), pos.7_24(11)> >> 2>>> STMT prephitmp.29_89 = PHI <pos.5_18, pos.7_24> >> prephitmp.29_89 = PHI <pos.5_18(10), pos.7_24(11)> >> LKUP STMT pos_lsm.34_53 = PHI <pos.5_18, pos.7_24> >> pos_lsm.34_53 = PHI <pos.5_18(10), pos.7_24(11)> >> FIND: prephitmp.29_89 >> 0>>> COPY pos_lsm.34_53 = prephitmp.29_89 >> <<<< STMT prephitmp.29_89 = PHI <pos.5_18, pos.7_24> >> prephitmp.29_89 = PHI <pos.5_18(10), pos.7_24(11)> >> Optimizing statement if (prephitmp.29_89 == -1) >> LKUP STMT prephitmp.29_89 eq_expr -1 >> if (prephitmp.29_89 == -1) >> >> So does it mean phicprop (eliminate_degenerate_phis) needs to be >> improved as well? >> > > One more clue I find is, after dom2, prephitmp.29_89 isn't really propagated > and pos_lsm.34_53 is still being used by block 15 as below, although block 15 > is dominated by block 12. > > <bb 15>: > # direction_lsm.33_121 = PHI <direction_lsm.33_51(14)> > # j_lsm.32_122 = PHI <j_lsm.32_52(14)> > # pos_lsm.34_123 = PHI <pos_lsm.34_53(14)> > > Is this the root cause that redundant PHI stmt "pos_lsm.34_53 = PHI > <pos.5_18(10), pos.7_24(11)>" can't be removed?
Well, check why cprop_into_stmt does nothing here. It should already work. Richard.