On Wed, 22 Sep 2021, Xionghu Luo wrote:

> 
> 
> On 2021/8/11 17:16, Richard Biener wrote:
> > On Wed, 11 Aug 2021, Xionghu Luo wrote:
> > 
> >>
> >>
> >> On 2021/8/10 22:47, Richard Biener wrote:
> >>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
> >>>
> >>>> Thanks,
> >>>>
> >>>> On 2021/8/6 19:46, Richard Biener wrote:
> >>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> >>>>>
> >>>>>> loop split condition is moved between loop1 and loop2, the split bb's
> >>>>>> count and probability should also be duplicated instead of (100% vs
> >>>>>> INV),
> >>>>>> secondly, the original loop1 and loop2 count need be propotional from
> >>>>>> the
> >>>>>> original loop.
> >>>>>>
> >>>>>>
> >>>>>> diff base/loop-cond-split-1.c.151t.lsplit
> >>>>>> patched/loop-cond-split-1.c.151t.lsplit:
> >>>>>> ...
> >>>>>>       int prephitmp_16;
> >>>>>>       int prephitmp_25;
> >>>>>>
> >>>>>>       <bb 2> [local count: 118111600]:
> >>>>>>       if (n_7(D) > 0)
> >>>>>>         goto <bb 4>; [89.00%]
> >>>>>>       else
> >>>>>>         goto <bb 3>; [11.00%]
> >>>>>>
> >>>>>>       <bb 3> [local count: 118111600]:
> >>>>>>       return;
> >>>>>>
> >>>>>>       <bb 4> [local count: 105119324]:
> >>>>>>       pretmp_3 = ga;
> >>>>>>
> >>>>>> -  <bb 5> [local count: 955630225]:
> >>>>>> +  <bb 5> [local count: 315357973]:
> >>>>>>       # i_13 = PHI <i_10(20), 0(4)>
> >>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
> >>>>>>       if (prephitmp_12 != 0)
> >>>>>>         goto <bb 6>; [33.00%]
> >>>>>>       else
> >>>>>>         goto <bb 7>; [67.00%]
> >>>>>>
> >>>>>> -  <bb 6> [local count: 315357972]:
> >>>>>> +  <bb 6> [local count: 104068130]:
> >>>>>>       _2 = do_something ();
> >>>>>>       ga = _2;
> >>>>>>
> >>>>>> -  <bb 7> [local count: 955630225]:
> >>>>>> +  <bb 7> [local count: 315357973]:
> >>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
> >>>>>>       i_10 = inc (i_13);
> >>>>>>       if (n_7(D) > i_10)
> >>>>>>         goto <bb 21>; [89.00%]
> >>>>>>       else
> >>>>>>         goto <bb 11>; [11.00%]
> >>>>>>
> >>>>>>       <bb 11> [local count: 105119324]:
> >>>>>>       goto <bb 3>; [100.00%]
> >>>>>>
> >>>>>> -  <bb 21> [local count: 850510901]:
> >>>>>> +  <bb 21> [local count: 280668596]:
> >>>>>>       if (prephitmp_12 != 0)
> >>>>>> -    goto <bb 20>; [100.00%]
> >>>>>> +    goto <bb 20>; [33.00%]
> >>>>>>       else
> >>>>>> -    goto <bb 19>; [INV]
> >>>>>> +    goto <bb 19>; [67.00%]
> >>>>>>
> >>>>>> -  <bb 20> [local count: 850510901]:
> >>>>>> +  <bb 20> [local count: 280668596]:
> >>>>>>       goto <bb 5>; [100.00%]
> >>>>>>
> >>>>>> -  <bb 19> [count: 0]:
> >>>>>> +  <bb 19> [local count: 70429947]:
> >>>>>>       # i_23 = PHI <i_10(21)>
> >>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
> >>>>>>
> >>>>>> -  <bb 12> [local count: 955630225]:
> >>>>>> +  <bb 12> [local count: 640272252]:
> >>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
> >>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
> >>>>>>       i_22 = inc (i_15);
> >>>>>>       if (n_7(D) > i_22)
> >>>>>>         goto <bb 16>; [89.00%]
> >>>>>>       else
> >>>>>>         goto <bb 11>; [11.00%]
> >>>>>>
> >>>>>> -  <bb 16> [local count: 850510901]:
> >>>>>> +  <bb 16> [local count: 569842305]:
> >>>>>>       goto <bb 12>; [100.00%]
> >>>>>>
> >>>>>>     }
> >>>>>>
> >>>>>> gcc/ChangeLog:
> >>>>>>
> >>>>>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> >>>>>>  (do_split_loop_on_cond): Likewise.
> >>>>>> ---
> >>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
> >>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
> >>>>>>
> >>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> >>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
> >>>>>> --- a/gcc/tree-ssa-loop-split.c
> >>>>>> +++ b/gcc/tree-ssa-loop-split.c
> >>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> >>>>>>      basic_block cond_bb;
> >>>>
> >>>>          if (!initial_true)
> >>>> -          cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>>> +          cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>>> +
> >>>> +        edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> >>>> +                           ? EDGE_SUCC (bbs[i], 0)
> >>>> +                           : EDGE_SUCC (bbs[i], 1);
> >>>>
> >>>>>>     
> >>>>>>        class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> >>>>>> -                                         profile_probability::always
> >>>>>> (),
> >>>>>> -                                         profile_probability::always
> >>>>>> (),
> >>>>>> -                                         profile_probability::always
> >>>>>> (),
> >>>>>> -                                         profile_probability::always
> >>>>>> (),
> >>>>>> +                                         true_edge->probability,
> >>>>>> +
> >>>>>> true_edge->probability.invert (),
> >>>>>> +                                         true_edge->probability,
> >>>>>> +
> >>>>>> true_edge->probability.invert (),
> >>>>>>             true);
> >>>>>
> >>>>> there is no 'true_edge' variable at this point.
> >>>>
> >>>> Sorry, missed the above hunk when split the patch.
> >>>>
> >>>>>
> >>>>>>      gcc_assert (loop2);
> >>>>>>     @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1,
> >>>>>> edge invar_branch)
> >>>>>>       initialize_original_copy_tables ();
> >>>>>>     
> >>>>>>       struct loop *loop2 = loop_version (loop1, boolean_true_node,
> >>>>>> NULL,
> >>>>>> -                                   profile_probability::always (),
> >>>>>> -                                   profile_probability::never (),
> >>>>>> -                                   profile_probability::always (),
> >>>>>> -                                   profile_probability::always (),
> >>>>>> +                                   invar_branch->probability.invert
> >>>>>> (),
> >>>>>> +                                   invar_branch->probability,
> >>>>>> +                                   invar_branch->probability.invert
> >>>>>> (),
> >>>>>> +                                   invar_branch->probability,
> >>>>>>                                     true);
> >>>>>>       if (!loop2)
> >>>>>>         {
> >>>>>
> >>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
> >>>>
> >>>> split_loop faces similar issue though it sets the two branches to 100% vs
> >>>> 100%
> >>>> and no scaling which seems also incorrect.
> >>>>
> >>>>> Since loop versioning inserts a condition with the passed probabilities
> >>>>> but in this case a 'boolean_true_node' condition the then and else
> >>>>> probabilities passed look correct.  It's just the scaling arguments
> >>>>> that look wrong?  This loop_version call should get a comment as to
> >>>>> why we are passing probabilities the way we do.
> >>>>
> >>>> This optimization is transforming from:
> >>>>
> >>>> for (i = 0; i < n; i = inc (i))
> >>>>       {
> >>>>         if (ga)
> >>>>           ga = do_something ();
> >>>> }
> >>>>
> >>>> to:
> >>>>
> >>>>     for (i = 0; i < x; i = inc (i))
> >>>> {
> >>>>       if (true)
> >>>>           ga = do_something ();
> >>>>           if (!ga)
> >>>>             break;
> >>>> }
> >>>>     for (; i < n; i = inc (i))
> >>>> {
> >>>>       if (false)
> >>>>            ga = do_something ();
> >>>> }
> >>>>
> >>>>
> >>>> `boolean_true_node` is passed in to make the first loop's condition
> >>>> statement to be 'true', after returning from loop_version, there is a
> >>>> piece of code forcing the condition in second loop to be false,
> >>>> and the original condition is moved from *in loop* to *exit edge*
> >>>> between loop1 and loop2.
> >>>
> >>> Yes, one complication is that we use loop_version but do not retain
> >>> the CFG it creates.  Something like the vectorizers
> >>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
> >>> but then that code doesn't do any scaling yet.  But then
> >>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
> >>> we could write a variant that simply doesn't mangle the CFG
> >>> with a new condition switching between both loops but keeps them
> >>> executed after each other ...
> >>>
> >>> As said, this adds to the confusion and some awkwardness.
> >>
> >> Then loop_version in loop split requires two types of variant, one
> >> is to insert condition to loop preheader for 'split_loops' usage,
> >> another is to insert condition to loop exit for 'do_split_loop_on_cond'
> >> usage, it needs one extra function to encapsulate these cfg alterations
> >> from outside to inside.
> >>
> >> unswitching only execute one loop as it only moves the invariant condition
> >> to first loop's pre-header.  While 'split_loops' and
> >> 'do_split_loop_on_cond'
> >> may execute both loops no matter the condition is moved to the first loop's
> >> preheader or exit.
> >>
> >> The condition stmt in loop unswitching is invariant, but it is variant
> >> in loop splitting, that's why loop unswitching execute only one loop
> >> but loop splitting executes both loops.
> >>
> >> Seems we need two condition arguments for loop_version, one for connecting
> >> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
> >> to loop2's header?  Then it will be more generic for both unswitching pass
> >> and splitting pass.  Looks a bit complicated and conflicted with
> >> loop_version's
> >> comments:
> >>
> >> /* Main entry point for Loop Versioning transformation.
> >>
> >>     This transformation given a condition and a loop, creates
> >>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
> >>
> >>
> >> And this only works for loop split usage, those many other places
> >> doesn't use loop_version like this?
> > 
> > Yes, as said if you don't want the above CFG then you probably
> > shouldn't use loop_version but instead use its building blocks
> > (and some refactoring of loop_version can make that easier).
> > 
> > I think splitting wants
> > 
> >    loop_copy1
> >    if (condition)
> >      loop_copy2
> > 
> > IMHO it would be good to split 'loopify' into the actual loopification
> > and the scaling.  Basically make the part of loop_version that
> > copies the loop on the header edge and creates a loop structure for
> > it separate.
> > 
> > loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
> > (copy_loop_before).
> > 
> 
> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
> copying loops with single exit, it would cause many ICEs in it even
> building GCC stage1 (line 1065, line 1184 due to exit or new_exit
> is NULL returning from single_exit (loop).).  Seems loop_version is
> more flexible for loop split.

Hmm, sure - loop_version does not need to do anything special with
exits since control flow either enters the original or the loop copy.

But slpeel_tree_duplicate_loop_to_edge_cfg intends to create
control flow that enters _both_ loops, so it needs to have
an edge from the first loop exit to the second loop entry.

One could extend this to a region copy, copying eventual CFG merges
of exits and specifying which exit of a SEME region is supposed
to be connected to the original region entry.

After all that's what loop splitting needs in the end - though not
sure what exactly it does with more than one exit.

So there's another set of "loop" copying, gimple_duplicate_sese_region,
which doesn't actually require a single exit but a single "important"
exit.  That might match how you treat multiple exits.

Richard.

Reply via email to