Re: [PATCH][GRAPHITE] Fix PR82451 (and PR82355 in a different way)

Richard Biener Thu, 12 Oct 2017 04:14:16 -0700

On Thu, 12 Oct 2017, Bin.Cheng wrote:

> On Wed, Oct 11, 2017 at 3:43 PM, Richard Biener <rguent...@suse.de> wrote:
> >
> > For PR82355 I introduced a fake dimension to ISL to allow CHRECs
> > having an evolution in a loop that isn't fully part of the SESE
> > region we are processing.  That was easier than fending off those
> > CHRECs (without simply giving up on SESE regions with those).
> >
> > But it didn't fully solve the issue as PR82451 shows where we run
> > into the issue that we eventually have to code-gen those
> > evolutions and thus in theory need a canonical IV of that containing loop.
> >
> > So I decided (after Micha pressuring me a bit...) to revisit the
> > original issue and make SCEV analysis "properly" handle SE regions.
> > It turns out that it is mostly instantiate_scev lacking proper support
> > plus the necessary interfacing change (really just cosmetic in some sense)
> > from a instantiate_before basic-block to a instantiate_before edge.
> >
> > data-ref interfaces have been similarly adjusted, here changing
> > the "loop nest" loop parameter to the entry edge for the SE region
> > and passing that down accordingly.
> >
> > I've for now tried to keep other high-level loop-based interfaces the
> > same by simply using the loop preheader edge as entry where appropriate
> > (needing loop_preheader_edge cope with the loop root tree for simplicity).
> >
> > In the process I ran into issues with us too overly aggressive
> > instantiating random trees and thus I cut those down.  That part
> > doesn't successfully test separately (when I remove the strange
> > ARRAY_REF instantiation), so it's part of this patch.  I've also
> > run into an SSA verification fail (the id-27.f90 testcase) which
> > shows we _do_ need to clear the SCEV cache after introducing
> > the versioned CFG (and added a comment before it).
> >
> > On the previously failing testcases I've verified we produce
> > sensible instantiations for those pesky refs residing in "no" loop
> > in the SCOP and that we get away with the result in terms of
> > optimizing.
> >
> > SPEC 2k6 testing shows
> >
> > loop nest optimized: 311
> > loop nest not optimized, code generation error: 0
> > loop nest not optimized, optimized schedule is identical to original
> > schedule: 173
> > loop nest not optimized, optimization timed out: 59
> > loop nest not optimized, ISL signalled an error: 9
> > loop nest: 552
> >
> > for SPEC 2k6 and -floop-nest-optimize while adding -fgraphite-identity
> > still reveals some codegen errors:
> >
> > loop nest optimized: 437
> > loop nest not optimized, code generation error: 25
> > loop nest not optimized, optimized schedule is identical to original
> > schedule: 169
> > loop nest not optimized, optimization timed out: 60
> > loop nest not optimized, ISL signalled an error: 9
> > loop nest: 700
> >
> > Bootstrap and regtest in progress on x86_64-unknown-linux-gnu
> > (with and without -fgraphite-identity -floop-nest-optimize).
> >
> > Ok?
> >
> > Thanks,
> > Richard.
> >
> 
> > Index: gcc/tree-scalar-evolution.c
> > ===================================================================
> > --- gcc/tree-scalar-evolution.c (revision 253645)
> > +++ gcc/tree-scalar-evolution.c (working copy)
> > @@ -2344,7 +2348,7 @@ static tree instantiate_scev_r (basic_bl
> >     instantiated, and to stop if it exceeds some limit.  */
> >
> >  static tree
> > -instantiate_scev_name (basic_block instantiate_below,
> > +instantiate_scev_name (edge instantiate_below,
> >                        struct loop *evolution_loop, struct loop *inner_loop,
> >                        tree chrec,
> >                        bool *fold_conversions,
> > @@ -2358,7 +2362,7 @@ instantiate_scev_name (basic_block insta
> >       evolutions in outer loops), nothing to do.  */
> >    if (!def_bb
> >        || loop_depth (def_bb->loop_father) == 0
> > -      || dominated_by_p (CDI_DOMINATORS, instantiate_below, def_bb))
> > +      || ! dominated_by_p (CDI_DOMINATORS, def_bb, 
> > instantiate_below->dest))
> >      return chrec;
> >
> >    /* We cache the value of instantiated variable to avoid exponential
> > @@ -2380,6 +2384,51 @@ instantiate_scev_name (basic_block insta
> >
> >    def_loop = find_common_loop (evolution_loop, def_bb->loop_father);
> >
> > +  if (! dominated_by_p (CDI_DOMINATORS,
> > +                       def_loop->header, instantiate_below->dest))
> > +    {
> > +      gimple *def = SSA_NAME_DEF_STMT (chrec);
> > +      if (gassign *ass = dyn_cast <gassign *> (def))
> > +       {
> > +         switch (gimple_assign_rhs_class (ass))
> > +           {
> > +           case GIMPLE_UNARY_RHS:
> > +             {
> > +               tree op0 = instantiate_scev_r (instantiate_below, 
> > evolution_loop,
> > +                                              inner_loop, 
> > gimple_assign_rhs1 (ass),
> > +                                              fold_conversions, size_expr);
> > +               if (op0 == chrec_dont_know)
> > +                 return chrec_dont_know;
> > +               res = fold_build1 (gimple_assign_rhs_code (ass),
> > +                                  TREE_TYPE (chrec), op0);
> > +               break;
> > +             }
> > +           case GIMPLE_BINARY_RHS:
> > +             {
> > +               tree op0 = instantiate_scev_r (instantiate_below, 
> > evolution_loop,
> > +                                              inner_loop, 
> > gimple_assign_rhs1 (ass),
> > +                                              fold_conversions, size_expr);
> > +               if (op0 == chrec_dont_know)
> > +                 return chrec_dont_know;
> > +               tree op1 = instantiate_scev_r (instantiate_below, 
> > evolution_loop,
> > +                                              inner_loop, 
> > gimple_assign_rhs2 (ass),
> > +                                              fold_conversions, size_expr);
> > +               if (op1 == chrec_dont_know)
> > +                 return chrec_dont_know;
> > +               res = fold_build2 (gimple_assign_rhs_code (ass),
> > +                                  TREE_TYPE (chrec), op0, op1);
> > +               break;
> > +             }
> > +           default:
> > +             res = chrec_dont_know;
> > +           }
> > +       }
> > +      else
> > +       res = chrec_dont_know;
> > +      global_cache->set (si, res);
> > +      return res;
> > +    }
> > +
> >    /* If the analysis yields a parametric chrec, instantiate the
> >       result again.  */
> >    res = analyze_scalar_evolution (def_loop, chrec);
> 
> IIUC, after changing instantiate_scev_r from loop based to region
> based, there are
> two cases.  In one case, def_loop is dominated by instantiate_edge,
> which we'd like
> to analyze/instantiate scev wrto the new def_loop; In the other case,
> def_loop is not
> fully part of sese region, which we'd like to expand ssa_name wrto the
> basic block
> instantiate_edge->dest.  It's simply ssa expanding and no loop is involved.


Note there can be still a loop involved if the SSA chain arrives at
a DEF that is defined within a loop inbetween the current def and
instantiate_edge->dest.  In that case we need to process the
compute_overall_effect_of_inner_loop case.

> So how about factor out above big if-statement into a function with name like
> expand_scev_name (Other better names?).  The code like:
> 
>   /* Some comment explaining the two cases in region based instantiation.  */
>   if (dominated_by_p (CDI_DOMINATORS, def_loop->header,
> instantiate_below->dest))
>     res = analyze_scalar_evolution (def_loop, chrec);
>   else
>     res = expand_scev_name (instantiate_below, chrec);
> 
> could be easier to read?

Note it would be

   else
     {
       res = expand_scev_name (instantiate_below, chrec);
       global_cache->set (si, res);
       return res;
     }

and expand_scev_name would still need to recurse via instantiate_scev.
It isn't merely gathering all expressions up to instantiate_edge->dest
from the stmts (see above).

But yes, factoring the above might be a good idea.  The above is also
not 100% equivalent in capabilities as the rest of the instantiation
machinery -- it lacks condition PHI handling and it fails to capture
the loop-closed PHI handling (so the above mentioned case wouldn't
be handled right now).

I sent the patch out for comments early before fleshing out all
those pesky details ;)  It's at least conservative correct
right now.

Richard.

Re: [PATCH][GRAPHITE] Fix PR82451 (and PR82355 in a different way)

Reply via email to