On Thu, 12 Oct 2017, Bin.Cheng wrote: > On Wed, Oct 11, 2017 at 3:43 PM, Richard Biener <rguent...@suse.de> wrote: > > > > For PR82355 I introduced a fake dimension to ISL to allow CHRECs > > having an evolution in a loop that isn't fully part of the SESE > > region we are processing. That was easier than fending off those > > CHRECs (without simply giving up on SESE regions with those). > > > > But it didn't fully solve the issue as PR82451 shows where we run > > into the issue that we eventually have to code-gen those > > evolutions and thus in theory need a canonical IV of that containing loop. > > > > So I decided (after Micha pressuring me a bit...) to revisit the > > original issue and make SCEV analysis "properly" handle SE regions. > > It turns out that it is mostly instantiate_scev lacking proper support > > plus the necessary interfacing change (really just cosmetic in some sense) > > from a instantiate_before basic-block to a instantiate_before edge. > > > > data-ref interfaces have been similarly adjusted, here changing > > the "loop nest" loop parameter to the entry edge for the SE region > > and passing that down accordingly. > > > > I've for now tried to keep other high-level loop-based interfaces the > > same by simply using the loop preheader edge as entry where appropriate > > (needing loop_preheader_edge cope with the loop root tree for simplicity). > > > > In the process I ran into issues with us too overly aggressive > > instantiating random trees and thus I cut those down. That part > > doesn't successfully test separately (when I remove the strange > > ARRAY_REF instantiation), so it's part of this patch. I've also > > run into an SSA verification fail (the id-27.f90 testcase) which > > shows we _do_ need to clear the SCEV cache after introducing > > the versioned CFG (and added a comment before it). > > > > On the previously failing testcases I've verified we produce > > sensible instantiations for those pesky refs residing in "no" loop > > in the SCOP and that we get away with the result in terms of > > optimizing. > > > > SPEC 2k6 testing shows > > > > loop nest optimized: 311 > > loop nest not optimized, code generation error: 0 > > loop nest not optimized, optimized schedule is identical to original > > schedule: 173 > > loop nest not optimized, optimization timed out: 59 > > loop nest not optimized, ISL signalled an error: 9 > > loop nest: 552 > > > > for SPEC 2k6 and -floop-nest-optimize while adding -fgraphite-identity > > still reveals some codegen errors: > > > > loop nest optimized: 437 > > loop nest not optimized, code generation error: 25 > > loop nest not optimized, optimized schedule is identical to original > > schedule: 169 > > loop nest not optimized, optimization timed out: 60 > > loop nest not optimized, ISL signalled an error: 9 > > loop nest: 700 > > > > Bootstrap and regtest in progress on x86_64-unknown-linux-gnu > > (with and without -fgraphite-identity -floop-nest-optimize). > > > > Ok? > > > > Thanks, > > Richard. > > > > > Index: gcc/tree-scalar-evolution.c > > =================================================================== > > --- gcc/tree-scalar-evolution.c (revision 253645) > > +++ gcc/tree-scalar-evolution.c (working copy) > > @@ -2344,7 +2348,7 @@ static tree instantiate_scev_r (basic_bl > > instantiated, and to stop if it exceeds some limit. */ > > > > static tree > > -instantiate_scev_name (basic_block instantiate_below, > > +instantiate_scev_name (edge instantiate_below, > > struct loop *evolution_loop, struct loop *inner_loop, > > tree chrec, > > bool *fold_conversions, > > @@ -2358,7 +2362,7 @@ instantiate_scev_name (basic_block insta > > evolutions in outer loops), nothing to do. */ > > if (!def_bb > > || loop_depth (def_bb->loop_father) == 0 > > - || dominated_by_p (CDI_DOMINATORS, instantiate_below, def_bb)) > > + || ! dominated_by_p (CDI_DOMINATORS, def_bb, > > instantiate_below->dest)) > > return chrec; > > > > /* We cache the value of instantiated variable to avoid exponential > > @@ -2380,6 +2384,51 @@ instantiate_scev_name (basic_block insta > > > > def_loop = find_common_loop (evolution_loop, def_bb->loop_father); > > > > + if (! dominated_by_p (CDI_DOMINATORS, > > + def_loop->header, instantiate_below->dest)) > > + { > > + gimple *def = SSA_NAME_DEF_STMT (chrec); > > + if (gassign *ass = dyn_cast <gassign *> (def)) > > + { > > + switch (gimple_assign_rhs_class (ass)) > > + { > > + case GIMPLE_UNARY_RHS: > > + { > > + tree op0 = instantiate_scev_r (instantiate_below, > > evolution_loop, > > + inner_loop, > > gimple_assign_rhs1 (ass), > > + fold_conversions, size_expr); > > + if (op0 == chrec_dont_know) > > + return chrec_dont_know; > > + res = fold_build1 (gimple_assign_rhs_code (ass), > > + TREE_TYPE (chrec), op0); > > + break; > > + } > > + case GIMPLE_BINARY_RHS: > > + { > > + tree op0 = instantiate_scev_r (instantiate_below, > > evolution_loop, > > + inner_loop, > > gimple_assign_rhs1 (ass), > > + fold_conversions, size_expr); > > + if (op0 == chrec_dont_know) > > + return chrec_dont_know; > > + tree op1 = instantiate_scev_r (instantiate_below, > > evolution_loop, > > + inner_loop, > > gimple_assign_rhs2 (ass), > > + fold_conversions, size_expr); > > + if (op1 == chrec_dont_know) > > + return chrec_dont_know; > > + res = fold_build2 (gimple_assign_rhs_code (ass), > > + TREE_TYPE (chrec), op0, op1); > > + break; > > + } > > + default: > > + res = chrec_dont_know; > > + } > > + } > > + else > > + res = chrec_dont_know; > > + global_cache->set (si, res); > > + return res; > > + } > > + > > /* If the analysis yields a parametric chrec, instantiate the > > result again. */ > > res = analyze_scalar_evolution (def_loop, chrec); > > IIUC, after changing instantiate_scev_r from loop based to region > based, there are > two cases. In one case, def_loop is dominated by instantiate_edge, > which we'd like > to analyze/instantiate scev wrto the new def_loop; In the other case, > def_loop is not > fully part of sese region, which we'd like to expand ssa_name wrto the > basic block > instantiate_edge->dest. It's simply ssa expanding and no loop is involved.
Note there can be still a loop involved if the SSA chain arrives at a DEF that is defined within a loop inbetween the current def and instantiate_edge->dest. In that case we need to process the compute_overall_effect_of_inner_loop case. > So how about factor out above big if-statement into a function with name like > expand_scev_name (Other better names?). The code like: > > /* Some comment explaining the two cases in region based instantiation. */ > if (dominated_by_p (CDI_DOMINATORS, def_loop->header, > instantiate_below->dest)) > res = analyze_scalar_evolution (def_loop, chrec); > else > res = expand_scev_name (instantiate_below, chrec); > > could be easier to read? Note it would be else { res = expand_scev_name (instantiate_below, chrec); global_cache->set (si, res); return res; } and expand_scev_name would still need to recurse via instantiate_scev. It isn't merely gathering all expressions up to instantiate_edge->dest from the stmts (see above). But yes, factoring the above might be a good idea. The above is also not 100% equivalent in capabilities as the rest of the instantiation machinery -- it lacks condition PHI handling and it fails to capture the loop-closed PHI handling (so the above mentioned case wouldn't be handled right now). I sent the patch out for comments early before fleshing out all those pesky details ;) It's at least conservative correct right now. Richard.