Richard,

I noticed that 'gimple' type was changed and send you updated patch.

Thanks.
Yuri.

2015-10-07 12:53 GMT+03:00 Yuri Rumyantsev <ysrum...@gmail.com>:
> Richard,
>
> I've fixed adding virtual phi argument and add check on irreducible basic 
> block.
> New patch is attached.
>
> I checked it for bootstrap and regression testing, no new failures.
>
> ChangeLog:
> 2015-10-07  Yuri Rumyantsev  <ysrum...@gmail.com>
>
> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
> "cfghooks.h", add prototypes for introduced new functions.
> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
> checks on ability of loop unswitching to tree_unswitch_single_loop;
> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
> on innermost loop check.
> (tree_unswitch_single_loop): Add all required checks on ability of
> loop unswitching under zero recursive level guard.
> (tree_unswitch_outer_loop): New function.
> (find_loop_guard): Likewise.
> (empty_bb_without_guard_p): Likewise.
> (used_outside_loop_p): Likewise.
> (get_vop_from_header): Likewise.
> (hoist_guard): Likewise.
> (check_exit_phi): Likewise.
>
>    gcc/testsuite/ChangeLog:
> * gcc.dg/loop-unswitch-2.c: New test.
> * gcc.dg/loop-unswitch-3.c: Likewise.
> * gcc.dg/loop-unswitch-4.c: Likewise.
>
>
> 2015-10-06 15:21 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>> On Tue, Oct 6, 2015 at 1:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>> Richard,
>>>
>>> Here is updated patch which reflects almost all your remarks:
>>> 1. Use ordinary get_loop_body.
>>> 2. Delete useless asserts.
>>> 3. Use check on iterated loop instead of finite_loop_p.
>>> 4. Do not update CFG by adjusting the CONDs condition to always true/false.
>>> 5. Add couple tests.
>>
>> +  /* Add NEW_ADGE argument for all phi in post-header block.  */
>> +  bb = exit->dest;
>> +  for (gphi_iterator gsi = gsi_start_phis (bb);
>> +       !gsi_end_p (gsi); gsi_next (&gsi))
>> +    {
>> +      gphi *phi = gsi.phi ();
>> +      /* edge_iterator ei; */
>> +      tree arg;
>> +      if (virtual_operand_p (gimple_phi_result (phi)))
>> +       {
>> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
>> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>
>> now I know what confused me - here you are looking at a loop exit PHI
>> but querying with the preheader edge index.  I think you need to walk
>> the loop header PHIs to find the PHI for the virtual operand and use that
>> to get the PHI arg from?
>>
>> The side-effect / used-outside code is still the same.  What matters
>> is side-effects outside of the loop-header protected code region, not
>> blocks excluding the inner loop.  Say,
>>
>>   for (;;)
>>     {
>>       if (invariant-guard)
>>         {
>>            printf ("Blah");
>>            for (;;)
>>              ;
>>         }
>>     }
>>
>> would still ok to be unswitched.  So instead of
>>
>> +      if (body[i]->loop_father != loop)
>> +       continue;
>>
>> it would be
>>
>>        if (dominated_by_p (CDI_DOMINATORS, body[i], header)
>>            && !dominated_by_p (CDI_DOMINATORS, body[i], fe->dest))
>>
>> with the obvious improvement to the patch to not only consider header checks
>> in the outer loop header but in the pre-header block of the inner loop.
>>
>> And I still think you should walk the exit PHIs args to see whether they
>> are defined in the non-guarded region of the outer loop instead of walking
>> all uses of all defs.
>>
>> Note that I think you miss endless loops as side-effects if that endless
>> loop occurs through a irreducible region (thus not reflected in the
>> loop tree).  Thus you should reject BB_IRREDUCIBLE_LOOP blocks
>> in the non-guarded region as well.
>>
>> It seems to me that protecting adjacent loops with a single guard is
>> also eligible for hoisting thus the restriction on loop->inner->next
>> should become a restriction on no loops (or irreducible regions)
>> in the non-guarded region.
>>
>> Most things can be improved as followup, but at least the
>> virtual PHI arg thing needs to be sorted out.
>>
>> Thanks,
>> Richard.
>>
>>
>>> ChangeLog:
>>> 2015-10-06  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>
>>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
>>> "cfghooks.h", add prototypes for introduced new functions.
>>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
>>> checks on ability of loop unswitching to tree_unswitch_single_loop;
>>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
>>> on innermost loop check.
>>> (tree_unswitch_single_loop): Add all required checks on ability of
>>> loop unswitching under zero recursive level guard.
>>> (tree_unswitch_outer_loop): New function.
>>> (find_loop_guard): Likewise.
>>> (empty_bb_without_guard_p): Likewise.
>>> (used_outside_loop_p): Likewise.
>>> (hoist_guard): Likewise.
>>> (check_exit_phi): Likewise.
>>>
>>>    gcc/testsuite/ChangeLog:
>>> * gcc.dg/loop-unswitch-2.c: New test.
>>> * gcc.dg/loop-unswitch-3.c: Likewise.
>>> * gcc.dg/loop-unswitch-4.c: Likewise.
>>>
>>> 2015-10-06 10:59 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>> On Mon, Oct 5, 2015 at 3:13 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>>>> Thanks Richard.
>>>>> I'd like to answer on your last comment related to using of exit edge
>>>>> argument for edge that skips loop.
>>>>> Let's consider the following test-case:
>>>>>
>>>>> #include <stdlib.h>
>>>>> #define N 32
>>>>> float *foo(int ustride, int size, float *src)
>>>>> {
>>>>>    float *buffer, *p;
>>>>>    int i, k;
>>>>>
>>>>>    if (!src)
>>>>>     return NULL;
>>>>>
>>>>>    buffer = (float *) malloc(N * size * sizeof(float));
>>>>>
>>>>>    if(buffer)
>>>>>       for(i=0, p=buffer; i<N; i++, src+=ustride)
>>>>> for(k=0; k<size; k++)
>>>>>  *p++ = src[k];
>>>>>
>>>>>    return buffer;
>>>>> }
>>>>>
>>>>> Before adding new edge we have in post-header bb:
>>>>>   <bb 9>:
>>>>>   # _6 = PHI <0B(8), buffer_20(16)>
>>>>>   return _6;
>>>>>
>>>>> It is clear that we must preserve function semantic and transform it to
>>>>> _6 = PHI <0B(12), buffer_19(9), buffer_19(4)>
>>>>
>>>> Ah, yeah.  I was confusing the loop exit of the inner vs. the outer loop.
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> 2015-10-05 13:57 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>> On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> I re-designed outer loop unswitching using basic idea of 23855 patch -
>>>>>>> hoist invariant guard if loop is empty without guard. Note that this
>>>>>>> was added to loop unswitching pass with simple modifications - using
>>>>>>> another loop iterator etc.
>>>>>>>
>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>> What is your opinion?
>>>>>>
>>>>>> Overall it looks good.  Some comments below - a few more testcases would
>>>>>> be nice as well.
>>>>>>
>>>>>> +  /* Loop must not be infinite.  */
>>>>>> +  if (!finite_loop_p (loop))
>>>>>> +    return false;
>>>>>>
>>>>>> why's that?
>>>>>>
>>>>>> +  body = get_loop_body_in_dom_order (loop);
>>>>>> +  for (i = 0; i < loop->num_nodes; i++)
>>>>>> +    {
>>>>>> +      if (body[i]->loop_father != loop)
>>>>>> +       continue;
>>>>>> +      if (!empty_bb_without_guard_p (loop, body[i]))
>>>>>>
>>>>>> I wonder if there is a better way to iterate over the interesting
>>>>>> blocks and PHIs
>>>>>> we need to check for side-effects (and thus we maybe can avoid gathering
>>>>>> the loop in DOM order).
>>>>>>
>>>>>> +      FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF)
>>>>>> +       {
>>>>>> +         if (may_be_used_outside
>>>>>>
>>>>>> may_be_used_outside can be hoisted above the loop.  I wonder if we can 
>>>>>> take
>>>>>> advantage of loop-closed SSA form here (and the fact we have a single 
>>>>>> exit
>>>>>> from the loop).  Iterating over exit dest PHIs and determining whether 
>>>>>> the
>>>>>> exit edge DEF is inside the loop part it may not be should be enough.
>>>>>>
>>>>>> +  gcc_assert (single_succ_p (pre_header));
>>>>>>
>>>>>> that should be always true.
>>>>>>
>>>>>> +  gsi_remove (&gsi, false);
>>>>>> +  bb = guard->dest;
>>>>>> +  remove_edge (guard);
>>>>>> +  /* Update dominance for destination of GUARD.  */
>>>>>> +  if (EDGE_COUNT (bb->preds) == 0)
>>>>>> +    {
>>>>>> +      basic_block s_bb;
>>>>>> +      gcc_assert (single_succ_p (bb));
>>>>>> +      s_bb = single_succ (bb);
>>>>>> +      delete_basic_block (bb);
>>>>>> +      if (single_pred_p (s_bb))
>>>>>> +       set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred 
>>>>>> (s_bb));
>>>>>>
>>>>>> all this massaging should be simplified by leaving it to CFG cleanup by
>>>>>> simply adjusting the CONDs condition to always true/false.  There is
>>>>>> gimple_cond_make_{true,false} () for this (would be nice to have a 
>>>>>> variant
>>>>>> taking a bool).
>>>>>>
>>>>>> +  new_edge = make_edge (pre_header, exit->dest, flags);
>>>>>> +  if (fix_dom_of_exit)
>>>>>> +    set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header);
>>>>>> +  update_stmt (gsi_stmt (gsi));
>>>>>>
>>>>>> the update_stmt should be not necessary, it's done by gsi_insert_after 
>>>>>> already.
>>>>>>
>>>>>> +  /* Add NEW_ADGE argument for all phi in post-header block.  */
>>>>>> +  bb = exit->dest;
>>>>>> +  for (gphi_iterator gsi = gsi_start_phis (bb);
>>>>>> +       !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>> +    {
>>>>>> +      gphi *phi = gsi.phi ();
>>>>>> +      /* edge_iterator ei; */
>>>>>> +      tree arg;
>>>>>> +      if (virtual_operand_p (gimple_phi_result (phi)))
>>>>>> +       {
>>>>>> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
>>>>>> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>>>>> +       }
>>>>>> +      else
>>>>>> +       {
>>>>>> +         /* Use exit edge argument.  */
>>>>>> +         arg = PHI_ARG_DEF_FROM_EDGE (phi, exit);
>>>>>> +         add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>>>>>
>>>>>> Hum.  How is it ok to use the exit edge argument for the edge that skips
>>>>>> the loop?  Why can't you always use the pre-header edge value?
>>>>>> That is, if we have
>>>>>>
>>>>>>  for(i=0;i<m;++i)
>>>>>>    {
>>>>>>      if (n > 0)
>>>>>>     {
>>>>>>      for (;;)
>>>>>>        {
>>>>>>        }
>>>>>>      }
>>>>>>    }
>>>>>>   ... = i;
>>>>>>
>>>>>> then i is used after the loop and the correct value to use if
>>>>>> n > 0 is false is '0'.  Maybe this way we can also relax
>>>>>> what check_exit_phi does?  IMHO the only restriction is
>>>>>> if sth defined inside the loop before the header check for
>>>>>> the inner loop is used after the loop.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> ChangeLog:
>>>>>>> 2015-09-30  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>
>>>>>>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
>>>>>>> "cfghooks.h", add prototypes for introduced new functions.
>>>>>>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
>>>>>>> checks on ability of loop unswitching to tree_unswitch_single_loop;
>>>>>>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
>>>>>>> on innermost loop check.
>>>>>>> (tree_unswitch_single_loop): Add all required checks on ability of
>>>>>>> loop unswitching under zero recursive level guard.
>>>>>>> (tree_unswitch_outer_loop): New function.
>>>>>>> (find_loop_guard): Likewise.
>>>>>>> (empty_bb_without_guard_p): Likewise.
>>>>>>> (used_outside_loop_p): Likewise.
>>>>>>> (hoist_guard): Likewise.
>>>>>>> (check_exit_phi): Likewise.
>>>>>>>
>>>>>>>    gcc/testsuite/ChangeLog:
>>>>>>> * gcc.dg/loop-unswitch-2.c: New test.
>>>>>>>
>>>>>>> 2015-09-16 11:26 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do
>>>>>>>> this hoisting in loop header copying.  Integrating it
>>>>>>>> with LIM would be a better fit eventually.
>>>>>>>>
>>>>>>>> Note that we did agree to go forward with your original patch just
>>>>>>>> making it more "generically" perform outer loop
>>>>>>>> unswitching.  Did you explore that idea further?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>> Thanks Richard.
>>>>>>>>>
>>>>>>>>> I found one more issue that could not be fixed simply. In 23855 you
>>>>>>>>> consider the following test-case:
>>>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>>>> {
>>>>>>>>>   int i, j;
>>>>>>>>>   for (j=0; j<*je; ++j)
>>>>>>>>>     for (i=0; i<*ie; ++i)
>>>>>>>>>       x[i+j] = 0.0;
>>>>>>>>> }
>>>>>>>>> and proposed to hoist up a check on *ie out of loop. It requires
>>>>>>>>> memref alias analysis since in general x and ie can alias (if their
>>>>>>>>> types are compatible - int *ie & int * x). Such analysis is performed
>>>>>>>>> by pre or lim passes. Without such analysis we can not hoist a test on
>>>>>>>>> non-zero for *ie out of loop using 238565 patch.
>>>>>>>>>  The second concern is that proposed copy header algorithm changes
>>>>>>>>> loop structure significantly and it is not accepted by vectorizer
>>>>>>>>> since latch is not empty (such transformation assumes loop peeling for
>>>>>>>>> one iteration. So I can propose to implement simple guard hoisting
>>>>>>>>> without copying header and tail blocks (if it is possible).
>>>>>>>>>
>>>>>>>>> I will appreciate you for any advice or help since without such
>>>>>>>>> hoisting we are not able to perform outer loop vectorization for
>>>>>>>>> important benchmark.
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>> 2015-09-15 14:22 GMT+03:00 Richard Biener 
>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>
>>>>>>>>>>> I started learning, tuning and debugging patch proposed in 23855 and
>>>>>>>>>>> discovered thta it does not work properly.
>>>>>>>>>>> So I wonder is it tested patch and it should work?
>>>>>>>>>>
>>>>>>>>>> I don't remember, but as it wasn't committed it certainly wasn't 
>>>>>>>>>> ready.
>>>>>>>>>>
>>>>>>>>>>> Should it accept for hoisting the following loop nest
>>>>>>>>>>>   for (i=0; i<n; i++) {
>>>>>>>>>>>     s = 0;
>>>>>>>>>>>     for (j=0; j<m; j++)
>>>>>>>>>>>       s += a[i] * b[j];
>>>>>>>>>>>     c[i] = s;
>>>>>>>>>>>   }
>>>>>>>>>>> Note that i-loop will nit be empty if m is equal to 0.
>>>>>>>>>>
>>>>>>>>>> if m is equal to 0 then we still have the c[i] = s store, no?  Of 
>>>>>>>>>> course
>>>>>>>>>> we could unswitch the outer loop on m == 0 but simple hoisting 
>>>>>>>>>> wouldn't work.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener 
>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev 
>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I learned your updated patch for 23825 and it is more general in
>>>>>>>>>>>>> comparison with my.
>>>>>>>>>>>>> I'd like to propose you a compromise - let's consider my patch 
>>>>>>>>>>>>> only
>>>>>>>>>>>>> for force-vectorize outer loop only to allow outer-loop
>>>>>>>>>>>>> vecctorization.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't see why we should special-case that if the approach in 
>>>>>>>>>>>> 23825
>>>>>>>>>>>> is sensible.
>>>>>>>>>>>>
>>>>>>>>>>>>> Note that your approach will not hoist invariant
>>>>>>>>>>>>> guards if loops contains something else except for inner-loop, 
>>>>>>>>>>>>> i.e. it
>>>>>>>>>>>>> won't be empty for taken branch.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, it does not perform unswitching but guard hoisting.  Note 
>>>>>>>>>>>> that this
>>>>>>>>>>>> is originally Zdenek Dvoraks patch.
>>>>>>>>>>>>
>>>>>>>>>>>>> I also would like to answer on your last question - CFG cleanup is
>>>>>>>>>>>>> invoked to perform deletion of single-argument phi nodes from tail
>>>>>>>>>>>>> block through substitution - such phi's prevent outer-loop
>>>>>>>>>>>>> vectorization. But it is clear that such transformation can be 
>>>>>>>>>>>>> done
>>>>>>>>>>>>> other pass.
>>>>>>>>>>>>
>>>>>>>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not
>>>>>>>>>>>> get rid of them?
>>>>>>>>>>>>
>>>>>>>>>>>>> What is your opinion?
>>>>>>>>>>>>
>>>>>>>>>>>> My opinion is that if we want to enhance unswitching to catch this
>>>>>>>>>>>> (or similar) cases then we should make it a lot more general than
>>>>>>>>>>>> your pattern-matching approach.  I see nothing that should prevent
>>>>>>>>>>>> us from considering unswitching non-innermost loops in general.
>>>>>>>>>>>> It should be only a cost consideration to not do non-innermost loop
>>>>>>>>>>>> unswitching (in addition to maybe a --param specifying the maximum
>>>>>>>>>>>> depth of a loop nest to unswitch).
>>>>>>>>>>>>
>>>>>>>>>>>> So my first thought when seeing your patch still holds - the patch
>>>>>>>>>>>> looks very much too specific.
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener 
>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev 
>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I checked that both test-cases from 23855 are sucessfully 
>>>>>>>>>>>>>>> unswitched
>>>>>>>>>>>>>>> by proposed patch. I understand that it does not catch deeper 
>>>>>>>>>>>>>>> loop
>>>>>>>>>>>>>>> nest as
>>>>>>>>>>>>>>>    for (i=0; i<10; i++)
>>>>>>>>>>>>>>>      for (j=0;j<n;j++)
>>>>>>>>>>>>>>>         for (k=0;k<20;k++)
>>>>>>>>>>>>>>>   ...
>>>>>>>>>>>>>>> but duplication of middle-loop does not look reasonable.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is dump for your second test-case:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>   int i, j;
>>>>>>>>>>>>>>>   for (j=0; j<*je; ++j)
>>>>>>>>>>>>>>>     for (i=0; i<*ie; ++i)
>>>>>>>>>>>>>>>       x[i+j] = 0.0;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> grep -i unswitch t6.c.119t.unswitch
>>>>>>>>>>>>>>> ;; Unswitching outer loop
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was saying that why go with a limited approach when a patch (in
>>>>>>>>>>>>>> unknown state...)
>>>>>>>>>>>>>> is available that does it more generally?  Also unswitching is 
>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>> expensive compared
>>>>>>>>>>>>>> to "moving" the invariant condition.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In your patch:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +  if (!nloop->force_vectorize)
>>>>>>>>>>>>>> +    nloop->force_vectorize = true;
>>>>>>>>>>>>>> +  if (loop->safelen != 0)
>>>>>>>>>>>>>> +    nloop->safelen = loop->safelen;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I see no guard on force_vectorize so = true looks bogus here.  
>>>>>>>>>>>>>> Please just use
>>>>>>>>>>>>>> copy_loop_info.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +  if (integer_nonzerop (cond_new))
>>>>>>>>>>>>>> +    gimple_cond_set_condition_from_tree (cond_stmt, 
>>>>>>>>>>>>>> boolean_true_node);
>>>>>>>>>>>>>> +  else if (integer_zerop (cond_new))
>>>>>>>>>>>>>> +    gimple_cond_set_condition_from_tree (cond_stmt, 
>>>>>>>>>>>>>> boolean_false_node);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gimple_cond_make_true/false (cond_stmt);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> btw, seems odd that we have to recompute which loop is the true 
>>>>>>>>>>>>>> / false variant
>>>>>>>>>>>>>> when we just fed a guard condition to loop_version.  Can't we 
>>>>>>>>>>>>>> statically
>>>>>>>>>>>>>> determine whether loop or nloop has the in-loop condition true 
>>>>>>>>>>>>>> or false?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +  /* Clean-up cfg to remove useless one-argument phi in exit 
>>>>>>>>>>>>>> block of
>>>>>>>>>>>>>> +     outer-loop.  */
>>>>>>>>>>>>>> +  cleanup_tree_cfg ();
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * 
>>>>>>>>>>>>>> size-of-function)
>>>>>>>>>>>>>> because it updates SSA form after each individual unswitching 
>>>>>>>>>>>>>> (and it does that
>>>>>>>>>>>>>> because it invokes itself recursively on unswitched loops).  But 
>>>>>>>>>>>>>> do you really
>>>>>>>>>>>>>> need to invoke CFG cleanup here?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener 
>>>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev 
>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here is presented simple transformation which tries to hoist 
>>>>>>>>>>>>>>>>> out of
>>>>>>>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is 
>>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>> restricted transformation since it accepts outer-loops with 
>>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>> simple cfg, as for example:
>>>>>>>>>>>>>>>>>     acc = 0;
>>>>>>>>>>>>>>>>>    for (i = 1; i <= m; i++) {
>>>>>>>>>>>>>>>>>       for (j = 0; j < n; j++)
>>>>>>>>>>>>>>>>>          if (l[j] == i) { v[j] = acc; acc++; };
>>>>>>>>>>>>>>>>>       acc <<= 1;
>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be
>>>>>>>>>>>>>>>>> completely deleted as dead code.
>>>>>>>>>>>>>>>>> The main goal of this transformation was to convert 
>>>>>>>>>>>>>>>>> outer-loop to form
>>>>>>>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also 
>>>>>>>>>>>>>>>>> included
>>>>>>>>>>>>>>>>> to patch).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new 
>>>>>>>>>>>>>>>>> failures.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think this is
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> as well.  It has a patch adding a invariant loop guard hoisting
>>>>>>>>>>>>>>>> phase to loop-header copying.  Yeah, it needs updating to
>>>>>>>>>>>>>>>> trunk again I suppose.  It's always non-stage1 when I come
>>>>>>>>>>>>>>>> back to that patch.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Your patch seems to be very specific and only handles outer
>>>>>>>>>>>>>>>> loops of innermost loops.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>> 2015-07-10  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and
>>>>>>>>>>>>>>>>> "gimple-iterator.h", add prototype for 
>>>>>>>>>>>>>>>>> tree_unswitch_outer_loop.
>>>>>>>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of 
>>>>>>>>>>>>>>>>> tree_unswitch_outer_loop.
>>>>>>>>>>>>>>>>> (tree_unswitch_outer_loop): New function.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test.
>>>>>>>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.

Attachment: patch.fixed
Description: Binary data

Reply via email to