On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Hi Richard, > > I learned your updated patch for 23825 and it is more general in > comparison with my. > I'd like to propose you a compromise - let's consider my patch only > for force-vectorize outer loop only to allow outer-loop > vecctorization.
I don't see why we should special-case that if the approach in 23825 is sensible. > Note that your approach will not hoist invariant > guards if loops contains something else except for inner-loop, i.e. it > won't be empty for taken branch. Yes, it does not perform unswitching but guard hoisting. Note that this is originally Zdenek Dvoraks patch. > I also would like to answer on your last question - CFG cleanup is > invoked to perform deletion of single-argument phi nodes from tail > block through substitution - such phi's prevent outer-loop > vectorization. But it is clear that such transformation can be done > other pass. Hmm, I wonder why the copy_prop pass after unswitching does not get rid of them? > What is your opinion? My opinion is that if we want to enhance unswitching to catch this (or similar) cases then we should make it a lot more general than your pattern-matching approach. I see nothing that should prevent us from considering unswitching non-innermost loops in general. It should be only a cost consideration to not do non-innermost loop unswitching (in addition to maybe a --param specifying the maximum depth of a loop nest to unswitch). So my first thought when seeing your patch still holds - the patch looks very much too specific. Richard. > Yuri. > > 2015-07-28 13:50 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Hi Richard, >>> >>> I checked that both test-cases from 23855 are sucessfully unswitched >>> by proposed patch. I understand that it does not catch deeper loop >>> nest as >>> for (i=0; i<10; i++) >>> for (j=0;j<n;j++) >>> for (k=0;k<20;k++) >>> ... >>> but duplication of middle-loop does not look reasonable. >>> >>> Here is dump for your second test-case: >>> >>> void foo(int *ie, int *je, double *x) >>> { >>> int i, j; >>> for (j=0; j<*je; ++j) >>> for (i=0; i<*ie; ++i) >>> x[i+j] = 0.0; >>> } >>> grep -i unswitch t6.c.119t.unswitch >>> ;; Unswitching outer loop >> >> I was saying that why go with a limited approach when a patch (in >> unknown state...) >> is available that does it more generally? Also unswitching is quite >> expensive compared >> to "moving" the invariant condition. >> >> In your patch: >> >> + if (!nloop->force_vectorize) >> + nloop->force_vectorize = true; >> + if (loop->safelen != 0) >> + nloop->safelen = loop->safelen; >> >> I see no guard on force_vectorize so = true looks bogus here. Please just >> use >> copy_loop_info. >> >> + if (integer_nonzerop (cond_new)) >> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node); >> + else if (integer_zerop (cond_new)) >> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node); >> >> gimple_cond_make_true/false (cond_stmt); >> >> btw, seems odd that we have to recompute which loop is the true / false >> variant >> when we just fed a guard condition to loop_version. Can't we statically >> determine whether loop or nloop has the in-loop condition true or false? >> >> + /* Clean-up cfg to remove useless one-argument phi in exit block of >> + outer-loop. */ >> + cleanup_tree_cfg (); >> >> I know unswitching is already O(number-of-unswitched-loops * >> size-of-function) >> because it updates SSA form after each individual unswitching (and it does >> that >> because it invokes itself recursively on unswitched loops). But do you >> really >> need to invoke CFG cleanup here? >> >> Richard. >> >>> Yuri. >>> >>> 2015-07-14 14:06 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Hi All, >>>>> >>>>> Here is presented simple transformation which tries to hoist out of >>>>> outer-loop a check on zero trip count for inner-loop. This is very >>>>> restricted transformation since it accepts outer-loops with very >>>>> simple cfg, as for example: >>>>> acc = 0; >>>>> for (i = 1; i <= m; i++) { >>>>> for (j = 0; j < n; j++) >>>>> if (l[j] == i) { v[j] = acc; acc++; }; >>>>> acc <<= 1; >>>>> } >>>>> >>>>> Note that degenerative outer loop (without inner loop) will be >>>>> completely deleted as dead code. >>>>> The main goal of this transformation was to convert outer-loop to form >>>>> accepted by outer-loop vectorization (such test-case is also included >>>>> to patch). >>>>> >>>>> Bootstrap and regression testing did not show any new failures. >>>>> >>>>> Is it OK for trunk? >>>> >>>> I think this is >>>> >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855 >>>> >>>> as well. It has a patch adding a invariant loop guard hoisting >>>> phase to loop-header copying. Yeah, it needs updating to >>>> trunk again I suppose. It's always non-stage1 when I come >>>> back to that patch. >>>> >>>> Your patch seems to be very specific and only handles outer >>>> loops of innermost loops. >>>> >>>> Richard. >>>> >>>>> ChangeLog: >>>>> 2015-07-10 Yuri Rumyantsev <ysrum...@gmail.com> >>>>> >>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and >>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop. >>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop. >>>>> (tree_unswitch_outer_loop): New function. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test. >>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.