On Fri, Jun 30, 2017 at 5:09 PM, Jeff Law <l...@redhat.com> wrote: > On 06/14/2017 07:08 AM, Bin Cheng wrote: >> Hi, >> Loop split currently generates below control flow graph for split loops: >> + >> + .------ guard1 ------. >> + v v >> + pre1(loop1) .---------->pre2(loop2) >> + | | | >> + .--->h1 | h2<----. >> + | | | | | >> + | ex1---. | .---ex2 | >> + | / v | | \ | >> + '---l1 guard2---' | l2---' >> + | | >> + | | >> + '--->join<---' >> + >> In which, >> + LOOP2 is the second loop after split, GUARD1 and GUARD2 are the two bbs >> + controling if loop2 should be executed. >> >> Take added test case as an example, the second loop only iterates for 1 time, >> as a result, the CFG and loop niter bound information can be refined. In >> general, >> guard1/guard2 can be folded to true/false if loop2's niters is known at >> compilation >> time. This patch does such improvement by analyzing and refining niters of >> loop2; as well as using that information to simplify CFG. With this patch, >> the second split loop as in test can be completely unrolled by later passes. > In your testcase the second loop iterates precisely once, right? In > fact, we know it always executes precisely one time regardless of the > incoming value of LEN. That's the property you're trying to detect and > exploit, right? > > >> >> Bootstrap and test on x86_64 and AArch64. Is it OK? >> >> Thanks, >> bin >> 2017-06-12 Bin Cheng <bin.ch...@arm.com> >> >> * tree-ssa-loop-split.c (compute_new_first_bound): Compute and >> return bound information for the second split loop. >> (adjust_loop_split): New function. >> (split_loop): Update calls to above functions. >> >> gcc/testsuite/ChangeLog >> 2017-06-12 Bin Cheng <bin.ch...@arm.com> >> >> * gcc.dg/loop-split-1.c: New test. >> >> >> 0002-lsplit-refine-cfg-niter-bound-20170601.txt >> >> >> From 61855c74c7db6178008f007198aedee9a03f13e6 Mon Sep 17 00:00:00 2001 >> From: amker <amker@amker-laptop.(none)> >> Date: Sun, 4 Jun 2017 02:26:34 +0800 >> Subject: [PATCH 2/2] lsplit-refine-cfg-niter-bound-20170601.txt >> >> --- >> gcc/testsuite/gcc.dg/loop-split-1.c | 16 +++ >> gcc/tree-ssa-loop-split.c | 197 >> ++++++++++++++++++++++++++++++++---- >> 2 files changed, 194 insertions(+), 19 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/loop-split-1.c >> >> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c >> index f8fe8e6..73c7dc2 100644 >> --- a/gcc/tree-ssa-loop-split.c >> +++ b/gcc/tree-ssa-loop-split.c >> @@ -387,28 +386,41 @@ connect_loops (struct loop *loop1, struct loop *loop2) >> >> Depending on the direction of the IVs and if the exit tests >> are strict or non-strict we need to use MIN or MAX, >> - and add or subtract 1. This routine computes newend above. */ >> + and add or subtract 1. This routine computes newend above. >> + >> + After computing the new bound (on j), we may be able to compare the >> + first loop's iteration space against the original loop's. If it is >> + comparable at compilation time, we may be able to compute the niter >> + bound of the second loop. Record the second loop's iteration bound >> + to SECOND_LOOP_NITER_BOUND which has below meaning: >> + >> + -3: Don't know anything about the second loop; >> + -2: The second loop must not be executed; >> + -1: The second loop must be executed, but niter bound is unknown; >> + n: The second loop must be executed, niter bound is n (>= 0); >> + >> + Note we compute niter bound for the second loop's latch. */ > How hard would it be to have a test for each of the 4 cases above? I > don't always ask for this, but ISTM this is a good chance to make sure > most of the new code gets covered by testing. > > Does case -2 really occur in practice or are you just being safe? ISTM > that if case -2 occurs then the loop shouldn't have been split to begin > with. If this happens in practice, do we clean up all the dead code if > the bounds are set properly. > > Similarly for N = 0, presumably meaning the second loop iterates exactly > once, but never traverses its backedge, is there any value in changing > the loop latch test so that it's always false? Or will that get cleaned > up later? > > >> >> static tree >> -compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter, >> - tree border, >> - enum tree_code guard_code, tree guard_init) >> +compute_new_first_bound (struct tree_niter_desc *niter, tree border, >> + enum tree_code guard_code, tree guard_init, >> + tree step, HOST_WIDE_INT *second_loop_niter_bound) >> { >> - /* The niter structure contains the after-increment IV, we need >> - the loop-enter base, so subtract STEP once. */ >> tree controlbase = niter->control.base; >> tree controlstep = niter->control.step; >> tree enddiff, end = niter->bound; >> tree type; >> >> - /* Compute end-beg. */ >> + /* Compute end-beg. >> + Note the niter structure contains the after-increment IV, we need the >> + loop-enter base, so subtract STEP once. */ >> if (POINTER_TYPE_P (TREE_TYPE (controlbase))) >> { >> controlstep = fold_build1 (NEGATE_EXPR, >> TREE_TYPE (controlstep), controlstep); >> enddiff = fold_build_pointer_plus (controlbase, controlstep); >> >> - type = unsigned_type_for (enddiff); >> + type = unsigned_type_for (TREE_TYPE (enddiff)); > Isn't this an indication the trunk sources are broken? Consider pushing > this forward independent of the rest of this change if it looks like > we're going to have to iterate more than once or twice. > > > >> @@ -462,8 +476,148 @@ compute_new_first_bound (gimple_seq *stmts, struct >> tree_niter_desc *niter, >> build_int_cst (type, addbound)); >> } >> >> - newbound = fold_build2 (minmax, TREE_TYPE (border), border, newbound); >> - return force_gimple_operand (newbound, stmts, true, NULL_TREE); >> + tree cmp_rslt = fold_build2 (cmp_code, boolean_type_node, border, >> newbound); >> + /* Only handle simple case with unit step. */ >> + if (wi::eq_p (wi::abs (step), 1) >> + && cmp_rslt != NULL_TREE && integer_nonzerop (cmp_rslt)) >> + { >> + if (integer_nonzerop (cmp_rslt)) >> + { >> + tree niter_type, diff; >> + >> + niter_type = unsigned_type_for (TREE_TYPE (newbound)); >> + /* The split condition indicates smaller iteration space than the >> + original loop, so the second loop must be executed. */ >> + if (cmp_code == LT_EXPR) >> + diff = fold_build2 (MINUS_EXPR, niter_type, >> + fold_convert (niter_type, newbound), >> + fold_convert (niter_type, border)); >> + else >> + diff = fold_build2 (MINUS_EXPR, niter_type, >> + fold_convert (niter_type, border), >> + fold_convert (niter_type, newbound)); >> + >> + /* Check if niter can be computed. */ >> + if (tree_fits_shwi_p (diff) && !tree_int_cst_sign_bit (diff)) >> + *second_loop_niter_bound = tree_to_shwi (diff) - 1; >> + else >> + *second_loop_niter_bound = -1; >> + >> + return border; >> + } >> + else if (integer_zerop (cmp_rslt)) >> + { >> + /* The split condition indicates larger iteration space than the >> + oiginal loop, so the second loop must not be executed. */ > s/oiginal/original/ > > > Overall I like it and it looks reasonably close to being ready. Let's > get the answers to the few questions above and decide how to move > forward after that. Hi, Sorry for being late. I will update patch according to your comments, but firstly I need to rework the first [01/02] patch on which this one is based.
Thanks, bin > > Thanks, > > Jeff