On 25/04/13 16:19, Richard Biener wrote: <SNIP> > and compared to the previous patch changed the tree-ssa-tailmerge.c > part to deal with merging of loop latch and loop preheader (even > if that's a really bad idea) to not regress gcc.dg/pr50763.c. > Any suggestion on how to improve that part welcome. <SNIP> > * tree-ssa-tail-merge.c: Include cfgloop.h. > (replace_block_by): When merging loop latches mark loops for fixup. <SNIP> > Index: trunk/gcc/tree-ssa-tail-merge.c > =================================================================== > *** trunk.orig/gcc/tree-ssa-tail-merge.c 2013-04-25 11:31:14.000000000 > +0200 > --- trunk/gcc/tree-ssa-tail-merge.c 2013-04-25 12:39:00.236390580 +0200 > *************** along with GCC; see the file COPYING3. > *** 197,202 **** > --- 197,203 ---- > #include "gimple-pretty-print.h" > #include "tree-ssa-sccvn.h" > #include "tree-dump.h" > + #include "cfgloop.h" > > /* ??? This currently runs as part of tree-ssa-pre. Why is this not > a stand-alone GIMPLE pass? */ > *************** replace_block_by (basic_block bb1, basic > *** 1459,1464 **** > --- 1460,1476 ---- > /* Mark the basic block as deleted. */ > mark_basic_block_deleted (bb1); > > + /* ??? If we merge the loop preheader with the loop latch we are creating > + additional entries into the loop, eventually rotating it. > + Mark loops for fixup in this case. > + ??? This is a completely unwanted transform and will wreck most > + loops at this point - but with just not considering loop latches as > + merge candidates we fail to commonize the two loops in > gcc.dg/pr50763.c. > + A better fix to avoid that regression is needed. */ > + if (current_loops > + && bb2->loop_father->latch == bb2) > + loops_state_set (LOOPS_NEED_FIXUP); > + > /* Redirect the incoming edges of bb1 to bb2. */ > for (i = EDGE_COUNT (bb1->preds); i > 0 ; --i) > { <SNIP>
Richard, I'm not sure if I get your comment about the two loops in pr50763.c. There is just one loop, both before and after tail-merge. BEFORE: 2(e) / \ * * 3 9 \ / * * 4 / \ * * 5 10 | | * * +--*7 8(x) | / \ * \ * * / 6 11 TAIL-MERGE: 1. merges empty blocks 10 and 11 2. merges empty block 5 and 6 3. merges block 4 and 7, which are empty except for testing the conditional, The transformations in steps 2 and 3 affect the loop. AFTER: 2(e) / \ * * 3 9 \ / * * +-*4,7 | / \ \ * * 5,6 10,11 \ * 8(x) Although step 2 and 3 reduce the amount of BBs, which could make sense for compile-for-size, I wonder whether this transformation works in general. Step 3 only works if the same conditional is tested, which means an eternal loop. Step 2 works if the loop pre-header and the loop latch are empty. This will be the case quite often since loop_optimizer_init is called with LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES before pass_pre. OTOH, loops will typically have a non-virtual phi, with different values coming from the loop pre-header and the loop latch, which prevents the optimization. So I think this is really a cornercase, and we should disregard it if that makes things simpler. Rather than fixing up the loop structure, we could prevent tail-merge in these cases. The current fix tests for current_loops == NULL, and I'm not sure that can still happen there, given that we have PROP_loops. It's not evident to me that the test bb2->loop_father->latch == bb2 is sufficient. Before calling tail_merge_optimize, we call loop_optimizer_finalize in which we assert that LOOPS_MAY_HAVE_MULTIPLE_LATCHES from there on, so in theory we might miss some latches. But I guess that pre (having started out with simple latches) maintains simple latches throughout, and that tail-merge does the same. Tentative patch attached. I'll try build & test. [ Btw, it would be nice if restricting the optimization also means that we can simplify dominator handling in the pass. ] Thanks, - Tom 2013-04-26 Tom de Vries <t...@codesourcery.com> * tree-ssa-tail-merge.c (find_same_succ_bb): Skip loop latch bbs. (replace_block_by): Don't set LOOPS_NEED_FIXUP. * gcc.dg/pr50763.c: Update test.
diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c index f2ab7444..e49c3e7 100644 --- a/gcc/tree-ssa-tail-merge.c +++ b/gcc/tree-ssa-tail-merge.c @@ -689,7 +689,8 @@ find_same_succ_bb (basic_block bb, same_succ *same_p) edge_iterator ei; edge e; - if (bb == NULL) + if (bb == NULL + || bb->loop_father->latch == bb) return; bitmap_set_bit (same->bbs, bb->index); FOR_EACH_EDGE (e, ei, bb->succs) @@ -1460,17 +1461,6 @@ replace_block_by (basic_block bb1, basic_block bb2) /* Mark the basic block as deleted. */ mark_basic_block_deleted (bb1); - /* ??? If we merge the loop preheader with the loop latch we are creating - additional entries into the loop, eventually rotating it. - Mark loops for fixup in this case. - ??? This is a completely unwanted transform and will wreck most - loops at this point - but with just not considering loop latches as - merge candidates we fail to commonize the two loops in gcc.dg/pr50763.c. - A better fix to avoid that regression is needed. */ - if (current_loops - && bb2->loop_father->latch == bb2) - loops_state_set (LOOPS_NEED_FIXUP); - /* Redirect the incoming edges of bb1 to bb2. */ for (i = EDGE_COUNT (bb1->preds); i > 0 ; --i) { diff --git a/gcc/testsuite/gcc.dg/pr50763.c b/gcc/testsuite/gcc.dg/pr50763.c index 60025e3..695b61c 100644 --- a/gcc/testsuite/gcc.dg/pr50763.c +++ b/gcc/testsuite/gcc.dg/pr50763.c @@ -12,5 +12,5 @@ foo (int c, int d) while (c == d); } -/* { dg-final { scan-tree-dump-times "== 33" 1 "pre"} } */ +/* { dg-final { scan-tree-dump-times "== 33" 2 "pre"} } */ /* { dg-final { cleanup-tree-dump "pre" } } */