Hi Richard,

I re-designed outer loop unswitching using basic idea of 23855 patch -
hoist invariant guard if loop is empty without guard. Note that this
was added to loop unswitching pass with simple modifications - using
another loop iterator etc.

Bootstrap and regression testing did not show any new failures.
What is your opinion?

Thanks.

ChangeLog:
2015-09-30  Yuri Rumyantsev  <ysrum...@gmail.com>

* tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
"cfghooks.h", add prototypes for introduced new functions.
(tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
checks on ability of loop unswitching to tree_unswitch_single_loop;
invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
on innermost loop check.
(tree_unswitch_single_loop): Add all required checks on ability of
loop unswitching under zero recursive level guard.
(tree_unswitch_outer_loop): New function.
(find_loop_guard): Likewise.
(empty_bb_without_guard_p): Likewise.
(used_outside_loop_p): Likewise.
(hoist_guard): Likewise.
(check_exit_phi): Likewise.

   gcc/testsuite/ChangeLog:
* gcc.dg/loop-unswitch-2.c: New test.

2015-09-16 11:26 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
> Yeah, as said, the patch wasn't fully ready and it also felt odd to do
> this hoisting in loop header copying.  Integrating it
> with LIM would be a better fit eventually.
>
> Note that we did agree to go forward with your original patch just
> making it more "generically" perform outer loop
> unswitching.  Did you explore that idea further?
>
>
>
> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Thanks Richard.
>>
>> I found one more issue that could not be fixed simply. In 23855 you
>> consider the following test-case:
>> void foo(int *ie, int *je, double *x)
>> {
>>   int i, j;
>>   for (j=0; j<*je; ++j)
>>     for (i=0; i<*ie; ++i)
>>       x[i+j] = 0.0;
>> }
>> and proposed to hoist up a check on *ie out of loop. It requires
>> memref alias analysis since in general x and ie can alias (if their
>> types are compatible - int *ie & int * x). Such analysis is performed
>> by pre or lim passes. Without such analysis we can not hoist a test on
>> non-zero for *ie out of loop using 238565 patch.
>>  The second concern is that proposed copy header algorithm changes
>> loop structure significantly and it is not accepted by vectorizer
>> since latch is not empty (such transformation assumes loop peeling for
>> one iteration. So I can propose to implement simple guard hoisting
>> without copying header and tail blocks (if it is possible).
>>
>> I will appreciate you for any advice or help since without such
>> hoisting we are not able to perform outer loop vectorization for
>> important benchmark.
>> and
>>
>> 2015-09-15 14:22 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>>> Hi Richard,
>>>>
>>>> I started learning, tuning and debugging patch proposed in 23855 and
>>>> discovered thta it does not work properly.
>>>> So I wonder is it tested patch and it should work?
>>>
>>> I don't remember, but as it wasn't committed it certainly wasn't ready.
>>>
>>>> Should it accept for hoisting the following loop nest
>>>>   for (i=0; i<n; i++) {
>>>>     s = 0;
>>>>     for (j=0; j<m; j++)
>>>>       s += a[i] * b[j];
>>>>     c[i] = s;
>>>>   }
>>>> Note that i-loop will nit be empty if m is equal to 0.
>>>
>>> if m is equal to 0 then we still have the c[i] = s store, no?  Of course
>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't 
>>> work.
>>>
>>> Richard.
>>>
>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>> wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> I learned your updated patch for 23825 and it is more general in
>>>>>> comparison with my.
>>>>>> I'd like to propose you a compromise - let's consider my patch only
>>>>>> for force-vectorize outer loop only to allow outer-loop
>>>>>> vecctorization.
>>>>>
>>>>> I don't see why we should special-case that if the approach in 23825
>>>>> is sensible.
>>>>>
>>>>>> Note that your approach will not hoist invariant
>>>>>> guards if loops contains something else except for inner-loop, i.e. it
>>>>>> won't be empty for taken branch.
>>>>>
>>>>> Yes, it does not perform unswitching but guard hoisting.  Note that this
>>>>> is originally Zdenek Dvoraks patch.
>>>>>
>>>>>> I also would like to answer on your last question - CFG cleanup is
>>>>>> invoked to perform deletion of single-argument phi nodes from tail
>>>>>> block through substitution - such phi's prevent outer-loop
>>>>>> vectorization. But it is clear that such transformation can be done
>>>>>> other pass.
>>>>>
>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not
>>>>> get rid of them?
>>>>>
>>>>>> What is your opinion?
>>>>>
>>>>> My opinion is that if we want to enhance unswitching to catch this
>>>>> (or similar) cases then we should make it a lot more general than
>>>>> your pattern-matching approach.  I see nothing that should prevent
>>>>> us from considering unswitching non-innermost loops in general.
>>>>> It should be only a cost consideration to not do non-innermost loop
>>>>> unswitching (in addition to maybe a --param specifying the maximum
>>>>> depth of a loop nest to unswitch).
>>>>>
>>>>> So my first thought when seeing your patch still holds - the patch
>>>>> looks very much too specific.
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Yuri.
>>>>>>
>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>> wrote:
>>>>>>>> Hi Richard,
>>>>>>>>
>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched
>>>>>>>> by proposed patch. I understand that it does not catch deeper loop
>>>>>>>> nest as
>>>>>>>>    for (i=0; i<10; i++)
>>>>>>>>      for (j=0;j<n;j++)
>>>>>>>>         for (k=0;k<20;k++)
>>>>>>>>   ...
>>>>>>>> but duplication of middle-loop does not look reasonable.
>>>>>>>>
>>>>>>>> Here is dump for your second test-case:
>>>>>>>>
>>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>>> {
>>>>>>>>   int i, j;
>>>>>>>>   for (j=0; j<*je; ++j)
>>>>>>>>     for (i=0; i<*ie; ++i)
>>>>>>>>       x[i+j] = 0.0;
>>>>>>>> }
>>>>>>>> grep -i unswitch t6.c.119t.unswitch
>>>>>>>> ;; Unswitching outer loop
>>>>>>>
>>>>>>> I was saying that why go with a limited approach when a patch (in
>>>>>>> unknown state...)
>>>>>>> is available that does it more generally?  Also unswitching is quite
>>>>>>> expensive compared
>>>>>>> to "moving" the invariant condition.
>>>>>>>
>>>>>>> In your patch:
>>>>>>>
>>>>>>> +  if (!nloop->force_vectorize)
>>>>>>> +    nloop->force_vectorize = true;
>>>>>>> +  if (loop->safelen != 0)
>>>>>>> +    nloop->safelen = loop->safelen;
>>>>>>>
>>>>>>> I see no guard on force_vectorize so = true looks bogus here.  Please 
>>>>>>> just use
>>>>>>> copy_loop_info.
>>>>>>>
>>>>>>> +  if (integer_nonzerop (cond_new))
>>>>>>> +    gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node);
>>>>>>> +  else if (integer_zerop (cond_new))
>>>>>>> +    gimple_cond_set_condition_from_tree (cond_stmt, 
>>>>>>> boolean_false_node);
>>>>>>>
>>>>>>> gimple_cond_make_true/false (cond_stmt);
>>>>>>>
>>>>>>> btw, seems odd that we have to recompute which loop is the true / false 
>>>>>>> variant
>>>>>>> when we just fed a guard condition to loop_version.  Can't we statically
>>>>>>> determine whether loop or nloop has the in-loop condition true or false?
>>>>>>>
>>>>>>> +  /* Clean-up cfg to remove useless one-argument phi in exit block of
>>>>>>> +     outer-loop.  */
>>>>>>> +  cleanup_tree_cfg ();
>>>>>>>
>>>>>>> I know unswitching is already O(number-of-unswitched-loops * 
>>>>>>> size-of-function)
>>>>>>> because it updates SSA form after each individual unswitching (and it 
>>>>>>> does that
>>>>>>> because it invokes itself recursively on unswitched loops).  But do you 
>>>>>>> really
>>>>>>> need to invoke CFG cleanup here?
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev 
>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> Here is presented simple transformation which tries to hoist out of
>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very
>>>>>>>>>> restricted transformation since it accepts outer-loops with very
>>>>>>>>>> simple cfg, as for example:
>>>>>>>>>>     acc = 0;
>>>>>>>>>>    for (i = 1; i <= m; i++) {
>>>>>>>>>>       for (j = 0; j < n; j++)
>>>>>>>>>>          if (l[j] == i) { v[j] = acc; acc++; };
>>>>>>>>>>       acc <<= 1;
>>>>>>>>>>    }
>>>>>>>>>>
>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be
>>>>>>>>>> completely deleted as dead code.
>>>>>>>>>> The main goal of this transformation was to convert outer-loop to 
>>>>>>>>>> form
>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included
>>>>>>>>>> to patch).
>>>>>>>>>>
>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>
>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>
>>>>>>>>> I think this is
>>>>>>>>>
>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855
>>>>>>>>>
>>>>>>>>> as well.  It has a patch adding a invariant loop guard hoisting
>>>>>>>>> phase to loop-header copying.  Yeah, it needs updating to
>>>>>>>>> trunk again I suppose.  It's always non-stage1 when I come
>>>>>>>>> back to that patch.
>>>>>>>>>
>>>>>>>>> Your patch seems to be very specific and only handles outer
>>>>>>>>> loops of innermost loops.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> ChangeLog:
>>>>>>>>>> 2015-07-10  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>
>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and
>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop.
>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop.
>>>>>>>>>> (tree_unswitch_outer_loop): New function.
>>>>>>>>>>
>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test.
>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.

Attachment: patch.new
Description: Binary data

Reply via email to