Richard,

I've fixed adding virtual phi argument and add check on irreducible basic block.
New patch is attached.

I checked it for bootstrap and regression testing, no new failures.

ChangeLog:
2015-10-07  Yuri Rumyantsev  <ysrum...@gmail.com>

* tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
"cfghooks.h", add prototypes for introduced new functions.
(tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
checks on ability of loop unswitching to tree_unswitch_single_loop;
invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
on innermost loop check.
(tree_unswitch_single_loop): Add all required checks on ability of
loop unswitching under zero recursive level guard.
(tree_unswitch_outer_loop): New function.
(find_loop_guard): Likewise.
(empty_bb_without_guard_p): Likewise.
(used_outside_loop_p): Likewise.
(get_vop_from_header): Likewise.
(hoist_guard): Likewise.
(check_exit_phi): Likewise.

   gcc/testsuite/ChangeLog:
* gcc.dg/loop-unswitch-2.c: New test.
* gcc.dg/loop-unswitch-3.c: Likewise.
* gcc.dg/loop-unswitch-4.c: Likewise.


2015-10-06 15:21 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
> On Tue, Oct 6, 2015 at 1:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Richard,
>>
>> Here is updated patch which reflects almost all your remarks:
>> 1. Use ordinary get_loop_body.
>> 2. Delete useless asserts.
>> 3. Use check on iterated loop instead of finite_loop_p.
>> 4. Do not update CFG by adjusting the CONDs condition to always true/false.
>> 5. Add couple tests.
>
> +  /* Add NEW_ADGE argument for all phi in post-header block.  */
> +  bb = exit->dest;
> +  for (gphi_iterator gsi = gsi_start_phis (bb);
> +       !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gphi *phi = gsi.phi ();
> +      /* edge_iterator ei; */
> +      tree arg;
> +      if (virtual_operand_p (gimple_phi_result (phi)))
> +       {
> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>
> now I know what confused me - here you are looking at a loop exit PHI
> but querying with the preheader edge index.  I think you need to walk
> the loop header PHIs to find the PHI for the virtual operand and use that
> to get the PHI arg from?
>
> The side-effect / used-outside code is still the same.  What matters
> is side-effects outside of the loop-header protected code region, not
> blocks excluding the inner loop.  Say,
>
>   for (;;)
>     {
>       if (invariant-guard)
>         {
>            printf ("Blah");
>            for (;;)
>              ;
>         }
>     }
>
> would still ok to be unswitched.  So instead of
>
> +      if (body[i]->loop_father != loop)
> +       continue;
>
> it would be
>
>        if (dominated_by_p (CDI_DOMINATORS, body[i], header)
>            && !dominated_by_p (CDI_DOMINATORS, body[i], fe->dest))
>
> with the obvious improvement to the patch to not only consider header checks
> in the outer loop header but in the pre-header block of the inner loop.
>
> And I still think you should walk the exit PHIs args to see whether they
> are defined in the non-guarded region of the outer loop instead of walking
> all uses of all defs.
>
> Note that I think you miss endless loops as side-effects if that endless
> loop occurs through a irreducible region (thus not reflected in the
> loop tree).  Thus you should reject BB_IRREDUCIBLE_LOOP blocks
> in the non-guarded region as well.
>
> It seems to me that protecting adjacent loops with a single guard is
> also eligible for hoisting thus the restriction on loop->inner->next
> should become a restriction on no loops (or irreducible regions)
> in the non-guarded region.
>
> Most things can be improved as followup, but at least the
> virtual PHI arg thing needs to be sorted out.
>
> Thanks,
> Richard.
>
>
>> ChangeLog:
>> 2015-10-06  Yuri Rumyantsev  <ysrum...@gmail.com>
>>
>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
>> "cfghooks.h", add prototypes for introduced new functions.
>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
>> checks on ability of loop unswitching to tree_unswitch_single_loop;
>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
>> on innermost loop check.
>> (tree_unswitch_single_loop): Add all required checks on ability of
>> loop unswitching under zero recursive level guard.
>> (tree_unswitch_outer_loop): New function.
>> (find_loop_guard): Likewise.
>> (empty_bb_without_guard_p): Likewise.
>> (used_outside_loop_p): Likewise.
>> (hoist_guard): Likewise.
>> (check_exit_phi): Likewise.
>>
>>    gcc/testsuite/ChangeLog:
>> * gcc.dg/loop-unswitch-2.c: New test.
>> * gcc.dg/loop-unswitch-3.c: Likewise.
>> * gcc.dg/loop-unswitch-4.c: Likewise.
>>
>> 2015-10-06 10:59 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>> On Mon, Oct 5, 2015 at 3:13 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>>> Thanks Richard.
>>>> I'd like to answer on your last comment related to using of exit edge
>>>> argument for edge that skips loop.
>>>> Let's consider the following test-case:
>>>>
>>>> #include <stdlib.h>
>>>> #define N 32
>>>> float *foo(int ustride, int size, float *src)
>>>> {
>>>>    float *buffer, *p;
>>>>    int i, k;
>>>>
>>>>    if (!src)
>>>>     return NULL;
>>>>
>>>>    buffer = (float *) malloc(N * size * sizeof(float));
>>>>
>>>>    if(buffer)
>>>>       for(i=0, p=buffer; i<N; i++, src+=ustride)
>>>> for(k=0; k<size; k++)
>>>>  *p++ = src[k];
>>>>
>>>>    return buffer;
>>>> }
>>>>
>>>> Before adding new edge we have in post-header bb:
>>>>   <bb 9>:
>>>>   # _6 = PHI <0B(8), buffer_20(16)>
>>>>   return _6;
>>>>
>>>> It is clear that we must preserve function semantic and transform it to
>>>> _6 = PHI <0B(12), buffer_19(9), buffer_19(4)>
>>>
>>> Ah, yeah.  I was confusing the loop exit of the inner vs. the outer loop.
>>>
>>> Richard.
>>>
>>>>
>>>> 2015-10-05 13:57 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>> On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>> wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> I re-designed outer loop unswitching using basic idea of 23855 patch -
>>>>>> hoist invariant guard if loop is empty without guard. Note that this
>>>>>> was added to loop unswitching pass with simple modifications - using
>>>>>> another loop iterator etc.
>>>>>>
>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>> What is your opinion?
>>>>>
>>>>> Overall it looks good.  Some comments below - a few more testcases would
>>>>> be nice as well.
>>>>>
>>>>> +  /* Loop must not be infinite.  */
>>>>> +  if (!finite_loop_p (loop))
>>>>> +    return false;
>>>>>
>>>>> why's that?
>>>>>
>>>>> +  body = get_loop_body_in_dom_order (loop);
>>>>> +  for (i = 0; i < loop->num_nodes; i++)
>>>>> +    {
>>>>> +      if (body[i]->loop_father != loop)
>>>>> +       continue;
>>>>> +      if (!empty_bb_without_guard_p (loop, body[i]))
>>>>>
>>>>> I wonder if there is a better way to iterate over the interesting
>>>>> blocks and PHIs
>>>>> we need to check for side-effects (and thus we maybe can avoid gathering
>>>>> the loop in DOM order).
>>>>>
>>>>> +      FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF)
>>>>> +       {
>>>>> +         if (may_be_used_outside
>>>>>
>>>>> may_be_used_outside can be hoisted above the loop.  I wonder if we can 
>>>>> take
>>>>> advantage of loop-closed SSA form here (and the fact we have a single exit
>>>>> from the loop).  Iterating over exit dest PHIs and determining whether the
>>>>> exit edge DEF is inside the loop part it may not be should be enough.
>>>>>
>>>>> +  gcc_assert (single_succ_p (pre_header));
>>>>>
>>>>> that should be always true.
>>>>>
>>>>> +  gsi_remove (&gsi, false);
>>>>> +  bb = guard->dest;
>>>>> +  remove_edge (guard);
>>>>> +  /* Update dominance for destination of GUARD.  */
>>>>> +  if (EDGE_COUNT (bb->preds) == 0)
>>>>> +    {
>>>>> +      basic_block s_bb;
>>>>> +      gcc_assert (single_succ_p (bb));
>>>>> +      s_bb = single_succ (bb);
>>>>> +      delete_basic_block (bb);
>>>>> +      if (single_pred_p (s_bb))
>>>>> +       set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred 
>>>>> (s_bb));
>>>>>
>>>>> all this massaging should be simplified by leaving it to CFG cleanup by
>>>>> simply adjusting the CONDs condition to always true/false.  There is
>>>>> gimple_cond_make_{true,false} () for this (would be nice to have a variant
>>>>> taking a bool).
>>>>>
>>>>> +  new_edge = make_edge (pre_header, exit->dest, flags);
>>>>> +  if (fix_dom_of_exit)
>>>>> +    set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header);
>>>>> +  update_stmt (gsi_stmt (gsi));
>>>>>
>>>>> the update_stmt should be not necessary, it's done by gsi_insert_after 
>>>>> already.
>>>>>
>>>>> +  /* Add NEW_ADGE argument for all phi in post-header block.  */
>>>>> +  bb = exit->dest;
>>>>> +  for (gphi_iterator gsi = gsi_start_phis (bb);
>>>>> +       !gsi_end_p (gsi); gsi_next (&gsi))
>>>>> +    {
>>>>> +      gphi *phi = gsi.phi ();
>>>>> +      /* edge_iterator ei; */
>>>>> +      tree arg;
>>>>> +      if (virtual_operand_p (gimple_phi_result (phi)))
>>>>> +       {
>>>>> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
>>>>> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>>>> +       }
>>>>> +      else
>>>>> +       {
>>>>> +         /* Use exit edge argument.  */
>>>>> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, exit);
>>>>> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>>>>
>>>>> Hum.  How is it ok to use the exit edge argument for the edge that skips
>>>>> the loop?  Why can't you always use the pre-header edge value?
>>>>> That is, if we have
>>>>>
>>>>>  for(i=0;i<m;++i)
>>>>>    {
>>>>>      if (n > 0)
>>>>>     {
>>>>>      for (;;)
>>>>>        {
>>>>>        }
>>>>>      }
>>>>>    }
>>>>>   ... = i;
>>>>>
>>>>> then i is used after the loop and the correct value to use if
>>>>> n > 0 is false is '0'.  Maybe this way we can also relax
>>>>> what check_exit_phi does?  IMHO the only restriction is
>>>>> if sth defined inside the loop before the header check for
>>>>> the inner loop is used after the loop.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> ChangeLog:
>>>>>> 2015-09-30  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>
>>>>>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
>>>>>> "cfghooks.h", add prototypes for introduced new functions.
>>>>>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
>>>>>> checks on ability of loop unswitching to tree_unswitch_single_loop;
>>>>>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
>>>>>> on innermost loop check.
>>>>>> (tree_unswitch_single_loop): Add all required checks on ability of
>>>>>> loop unswitching under zero recursive level guard.
>>>>>> (tree_unswitch_outer_loop): New function.
>>>>>> (find_loop_guard): Likewise.
>>>>>> (empty_bb_without_guard_p): Likewise.
>>>>>> (used_outside_loop_p): Likewise.
>>>>>> (hoist_guard): Likewise.
>>>>>> (check_exit_phi): Likewise.
>>>>>>
>>>>>>    gcc/testsuite/ChangeLog:
>>>>>> * gcc.dg/loop-unswitch-2.c: New test.
>>>>>>
>>>>>> 2015-09-16 11:26 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do
>>>>>>> this hoisting in loop header copying.  Integrating it
>>>>>>> with LIM would be a better fit eventually.
>>>>>>>
>>>>>>> Note that we did agree to go forward with your original patch just
>>>>>>> making it more "generically" perform outer loop
>>>>>>> unswitching.  Did you explore that idea further?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>> wrote:
>>>>>>>> Thanks Richard.
>>>>>>>>
>>>>>>>> I found one more issue that could not be fixed simply. In 23855 you
>>>>>>>> consider the following test-case:
>>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>>> {
>>>>>>>>   int i, j;
>>>>>>>>   for (j=0; j<*je; ++j)
>>>>>>>>     for (i=0; i<*ie; ++i)
>>>>>>>>       x[i+j] = 0.0;
>>>>>>>> }
>>>>>>>> and proposed to hoist up a check on *ie out of loop. It requires
>>>>>>>> memref alias analysis since in general x and ie can alias (if their
>>>>>>>> types are compatible - int *ie & int * x). Such analysis is performed
>>>>>>>> by pre or lim passes. Without such analysis we can not hoist a test on
>>>>>>>> non-zero for *ie out of loop using 238565 patch.
>>>>>>>>  The second concern is that proposed copy header algorithm changes
>>>>>>>> loop structure significantly and it is not accepted by vectorizer
>>>>>>>> since latch is not empty (such transformation assumes loop peeling for
>>>>>>>> one iteration. So I can propose to implement simple guard hoisting
>>>>>>>> without copying header and tail blocks (if it is possible).
>>>>>>>>
>>>>>>>> I will appreciate you for any advice or help since without such
>>>>>>>> hoisting we are not able to perform outer loop vectorization for
>>>>>>>> important benchmark.
>>>>>>>> and
>>>>>>>>
>>>>>>>> 2015-09-15 14:22 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>>> Hi Richard,
>>>>>>>>>>
>>>>>>>>>> I started learning, tuning and debugging patch proposed in 23855 and
>>>>>>>>>> discovered thta it does not work properly.
>>>>>>>>>> So I wonder is it tested patch and it should work?
>>>>>>>>>
>>>>>>>>> I don't remember, but as it wasn't committed it certainly wasn't 
>>>>>>>>> ready.
>>>>>>>>>
>>>>>>>>>> Should it accept for hoisting the following loop nest
>>>>>>>>>>   for (i=0; i<n; i++) {
>>>>>>>>>>     s = 0;
>>>>>>>>>>     for (j=0; j<m; j++)
>>>>>>>>>>       s += a[i] * b[j];
>>>>>>>>>>     c[i] = s;
>>>>>>>>>>   }
>>>>>>>>>> Note that i-loop will nit be empty if m is equal to 0.
>>>>>>>>>
>>>>>>>>> if m is equal to 0 then we still have the c[i] = s store, no?  Of 
>>>>>>>>> course
>>>>>>>>> we could unswitch the outer loop on m == 0 but simple hoisting 
>>>>>>>>> wouldn't work.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener 
>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev 
>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> I learned your updated patch for 23825 and it is more general in
>>>>>>>>>>>> comparison with my.
>>>>>>>>>>>> I'd like to propose you a compromise - let's consider my patch only
>>>>>>>>>>>> for force-vectorize outer loop only to allow outer-loop
>>>>>>>>>>>> vecctorization.
>>>>>>>>>>>
>>>>>>>>>>> I don't see why we should special-case that if the approach in 23825
>>>>>>>>>>> is sensible.
>>>>>>>>>>>
>>>>>>>>>>>> Note that your approach will not hoist invariant
>>>>>>>>>>>> guards if loops contains something else except for inner-loop, 
>>>>>>>>>>>> i.e. it
>>>>>>>>>>>> won't be empty for taken branch.
>>>>>>>>>>>
>>>>>>>>>>> Yes, it does not perform unswitching but guard hoisting.  Note that 
>>>>>>>>>>> this
>>>>>>>>>>> is originally Zdenek Dvoraks patch.
>>>>>>>>>>>
>>>>>>>>>>>> I also would like to answer on your last question - CFG cleanup is
>>>>>>>>>>>> invoked to perform deletion of single-argument phi nodes from tail
>>>>>>>>>>>> block through substitution - such phi's prevent outer-loop
>>>>>>>>>>>> vectorization. But it is clear that such transformation can be done
>>>>>>>>>>>> other pass.
>>>>>>>>>>>
>>>>>>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not
>>>>>>>>>>> get rid of them?
>>>>>>>>>>>
>>>>>>>>>>>> What is your opinion?
>>>>>>>>>>>
>>>>>>>>>>> My opinion is that if we want to enhance unswitching to catch this
>>>>>>>>>>> (or similar) cases then we should make it a lot more general than
>>>>>>>>>>> your pattern-matching approach.  I see nothing that should prevent
>>>>>>>>>>> us from considering unswitching non-innermost loops in general.
>>>>>>>>>>> It should be only a cost consideration to not do non-innermost loop
>>>>>>>>>>> unswitching (in addition to maybe a --param specifying the maximum
>>>>>>>>>>> depth of a loop nest to unswitch).
>>>>>>>>>>>
>>>>>>>>>>> So my first thought when seeing your patch still holds - the patch
>>>>>>>>>>> looks very much too specific.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener 
>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev 
>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I checked that both test-cases from 23855 are sucessfully 
>>>>>>>>>>>>>> unswitched
>>>>>>>>>>>>>> by proposed patch. I understand that it does not catch deeper 
>>>>>>>>>>>>>> loop
>>>>>>>>>>>>>> nest as
>>>>>>>>>>>>>>    for (i=0; i<10; i++)
>>>>>>>>>>>>>>      for (j=0;j<n;j++)
>>>>>>>>>>>>>>         for (k=0;k<20;k++)
>>>>>>>>>>>>>>   ...
>>>>>>>>>>>>>> but duplication of middle-loop does not look reasonable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is dump for your second test-case:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>   int i, j;
>>>>>>>>>>>>>>   for (j=0; j<*je; ++j)
>>>>>>>>>>>>>>     for (i=0; i<*ie; ++i)
>>>>>>>>>>>>>>       x[i+j] = 0.0;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> grep -i unswitch t6.c.119t.unswitch
>>>>>>>>>>>>>> ;; Unswitching outer loop
>>>>>>>>>>>>>
>>>>>>>>>>>>> I was saying that why go with a limited approach when a patch (in
>>>>>>>>>>>>> unknown state...)
>>>>>>>>>>>>> is available that does it more generally?  Also unswitching is 
>>>>>>>>>>>>> quite
>>>>>>>>>>>>> expensive compared
>>>>>>>>>>>>> to "moving" the invariant condition.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In your patch:
>>>>>>>>>>>>>
>>>>>>>>>>>>> +  if (!nloop->force_vectorize)
>>>>>>>>>>>>> +    nloop->force_vectorize = true;
>>>>>>>>>>>>> +  if (loop->safelen != 0)
>>>>>>>>>>>>> +    nloop->safelen = loop->safelen;
>>>>>>>>>>>>>
>>>>>>>>>>>>> I see no guard on force_vectorize so = true looks bogus here.  
>>>>>>>>>>>>> Please just use
>>>>>>>>>>>>> copy_loop_info.
>>>>>>>>>>>>>
>>>>>>>>>>>>> +  if (integer_nonzerop (cond_new))
>>>>>>>>>>>>> +    gimple_cond_set_condition_from_tree (cond_stmt, 
>>>>>>>>>>>>> boolean_true_node);
>>>>>>>>>>>>> +  else if (integer_zerop (cond_new))
>>>>>>>>>>>>> +    gimple_cond_set_condition_from_tree (cond_stmt, 
>>>>>>>>>>>>> boolean_false_node);
>>>>>>>>>>>>>
>>>>>>>>>>>>> gimple_cond_make_true/false (cond_stmt);
>>>>>>>>>>>>>
>>>>>>>>>>>>> btw, seems odd that we have to recompute which loop is the true / 
>>>>>>>>>>>>> false variant
>>>>>>>>>>>>> when we just fed a guard condition to loop_version.  Can't we 
>>>>>>>>>>>>> statically
>>>>>>>>>>>>> determine whether loop or nloop has the in-loop condition true or 
>>>>>>>>>>>>> false?
>>>>>>>>>>>>>
>>>>>>>>>>>>> +  /* Clean-up cfg to remove useless one-argument phi in exit 
>>>>>>>>>>>>> block of
>>>>>>>>>>>>> +     outer-loop.  */
>>>>>>>>>>>>> +  cleanup_tree_cfg ();
>>>>>>>>>>>>>
>>>>>>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * 
>>>>>>>>>>>>> size-of-function)
>>>>>>>>>>>>> because it updates SSA form after each individual unswitching 
>>>>>>>>>>>>> (and it does that
>>>>>>>>>>>>> because it invokes itself recursively on unswitched loops).  But 
>>>>>>>>>>>>> do you really
>>>>>>>>>>>>> need to invoke CFG cleanup here?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener 
>>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev 
>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here is presented simple transformation which tries to hoist 
>>>>>>>>>>>>>>>> out of
>>>>>>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is 
>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>> restricted transformation since it accepts outer-loops with 
>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>> simple cfg, as for example:
>>>>>>>>>>>>>>>>     acc = 0;
>>>>>>>>>>>>>>>>    for (i = 1; i <= m; i++) {
>>>>>>>>>>>>>>>>       for (j = 0; j < n; j++)
>>>>>>>>>>>>>>>>          if (l[j] == i) { v[j] = acc; acc++; };
>>>>>>>>>>>>>>>>       acc <<= 1;
>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be
>>>>>>>>>>>>>>>> completely deleted as dead code.
>>>>>>>>>>>>>>>> The main goal of this transformation was to convert outer-loop 
>>>>>>>>>>>>>>>> to form
>>>>>>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also 
>>>>>>>>>>>>>>>> included
>>>>>>>>>>>>>>>> to patch).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think this is
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> as well.  It has a patch adding a invariant loop guard hoisting
>>>>>>>>>>>>>>> phase to loop-header copying.  Yeah, it needs updating to
>>>>>>>>>>>>>>> trunk again I suppose.  It's always non-stage1 when I come
>>>>>>>>>>>>>>> back to that patch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Your patch seems to be very specific and only handles outer
>>>>>>>>>>>>>>> loops of innermost loops.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>> 2015-07-10  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and
>>>>>>>>>>>>>>>> "gimple-iterator.h", add prototype for 
>>>>>>>>>>>>>>>> tree_unswitch_outer_loop.
>>>>>>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of 
>>>>>>>>>>>>>>>> tree_unswitch_outer_loop.
>>>>>>>>>>>>>>>> (tree_unswitch_outer_loop): New function.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test.
>>>>>>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.

Attachment: patch.update2
Description: Binary data

Reply via email to