http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52424
Bug #: 52424 Summary: dom prematurely pops entries from const_and_copies stack Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: wschm...@gcc.gnu.org ReportedBy: wschm...@gcc.gnu.org CC: berg...@vnet.ibm.com, jiangning....@arm.com, rguent...@suse.de Created attachment 26775 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26775 Proposed patch of tree-ssa-dom.c Jiangning Liu reported that the following C code has recently experienced degraded performance on trunk. (Jiangning, please fill in the Host/Target/Build fields for your configuration, or tell me what they are if you don't have access. The problem is not related to specific targets in any event.) int *l, *r, *g; void test_func(int n) { int i; static int j; static int pos, direction, direction_pre; pos = 0; direction = 1; for ( i = 0; i < n; i++ ) { direction_pre = direction; for ( j = 0; j <= 400; j++ ) { if ( g[pos] == 200 ) break; if ( direction == 0 ) pos = l[pos]; else pos = r[pos]; if ( pos == -1 ) { if ( direction_pre != direction ) break; pos = 0; direction = !direction; } } f(g[pos]); } } When compiled with -O2, the dom2 details dump shows that a redundant phi is detected in block 12: <bb 12>: # prephitmp.28_89 = PHI <pos.4_18(10), pos.6_24(11)> # pos_lsm.33_53 = PHI <pos.4_18(10), pos.6_24(11)> LKUP STMT prephitmp.28_89 = PHI <pos.4_18, pos.6_24> prephitmp.28_89 = PHI <pos.4_18(10), pos.6_24(11)> 2>>> STMT prephitmp.28_89 = PHI <pos.4_18, pos.6_24> prephitmp.28_89 = PHI <pos.4_18(10), pos.6_24(11)> LKUP STMT pos_lsm.33_53 = PHI <pos.4_18, pos.6_24> pos_lsm.33_53 = PHI <pos.4_18(10), pos.6_24(11)> FIND: prephitmp.28_89 0>>> COPY pos_lsm.33_53 = prephitmp.28_89 However, the COPY is incorrectly removed from the const_and_copy stack after block 13 is processed. It should not be removed until after the last block dominated by bb12 (bb18) is processed. As a result, the copy is not propagated into a PHI in block 15: # pos_lsm.33_123 = PHI <pos_lsm.33_53(14)> As a result, an extra copy is eventually introduced that degrades performance. I tracked down the extra removal of const_and_copy stack entries to tree-ssa-threadedge.c:remove_temporary_equivalences. This removes pairs of entries that were introduced by thread_across_edge, and also removes a NULL marker entry beneath those entries. Thus calls to thread_across_edge must be preceded by a push of such a marker entry. thread_across_edge is called from tree-ssa-dom.c:dom_thread_across_edge, which is called in three places within dom_opt_leave_block. In two of those places, the push of a marker entry is present. In the third, it isn't. Adding the marker push appears to correct the problem. I've attached a proposed fix. Jiangning, can you please apply this and see if your performance problem is resolved? Thanks, Bill