Richard Biener <rguent...@suse.de> writes: > On Thu, 13 Jan 2022, guojiufu wrote: > >> On 2022-01-03 22:30, Richard Biener wrote: >> > On Wed, 22 Dec 2021, Jiufu Guo wrote: >> > >> >> Hi, >> >> ... >> >> >> >> Bootstrap and regtest pass on ppc64* and x86_64. Is this ok for trunk? >> > >> > So this is a optimality fix, not a correctness one? I suppose the >> > estimates are computed/used from scev_probably_wraps_p via >> > loop_exits_before_overflow and ultimatively chrec_convert. >> > >> > We have a call cycle here, >> > >> > estimate_numbers_of_iterations -> number_of_latch_executions -> >> > ... -> estimate_numbers_of_iterations >> > >> > where the first estimate_numbers_of_iterations will make sure >> > the later call will immediately return. >> >> Hi Richard, >> Thanks for your comments! And sorry for the late reply. >> >> In estimate_numbers_of_iterations, there is a guard to make sure >> the second call to estimate_numbers_of_iterations returns >> immediately. >> >> Exactly as you said, it relates to scev_probably_wraps_p calls >> loop_exits_before_overflow. >> >> The issue is: the first calling to estimate_numbers_of_iterations >> maybe inside number_of_latch_executions. >> >> > >> > I'm not sure what your patch tries to do - it seems to tackle >> > the case where we enter the cycle via number_of_latch_executions? >> > Why do we get "non-final" values? idx_infer_loop_bounds resorts >> >> Right, when the call cycle starts from number_of_latch_execution, >> the issue may occur: >> >> number_of_latch_executions(*1st call)->..-> >> analyze_scalar_evolution(IVs 1st) ->..follow_ssa_edge_expr..-> >> loop_exits_before_overflow-> >> estimate_numbers_of_iterations (*1st call)-> >> number_of_latch_executions(*2nd call)->..-> >> analyze_scalar_evolution(IVs 2nd)->..loop_exits_before_overflow-> >> estimate_numbers_of_iterations(*2nd call) >> >> The second calling to estimate_numbers_of_iterations returns quickly. >> And then, in the first calling to estimate_numbers_of_iterations, >> infer_loop_bounds_from_undefined is invoked. >> >> And, function "infer_loop_bounds_from_undefined" instantiate/analyze >> SCEV for each SSA in the loop. >> *Here the issue occur*, these SCEVs are based on the interim IV's >> SCEV which come from "analyze_scalar_evolution(IVs 2nd)", >> and those IV's SCEV will be overridden by up level >> "analyze_scalar_evolution(IVs 1st)". > > OK, so indeed analyze_scalar_evolution is not protected against > recursive invocation on the same SSA name (though it definitely > doesn't expect to do that). We could fix that by pre-seeding > the cache conservatively in analyze_scalar_evolution or by > not overwriting the cached result of the recursive invocation. > > But to re-iterate an unanswered question, is this a correctness issue > or an optimization issue?
Hi Richard, Thanks for your time and patience on review this! I would say it is an optimization issue for the current code, it does not fix known error. The patch could help compiling-time. Another benefit, this patch would be able to improve some scev(s) if the scev is cached in infer_loop_bounds_from_undefined under the call stack where IV's SCEV is under analyzing. Yes, in analyze_scalar_evolution call chain, it may recursive on same SSA name. While outer level analyze_scalar_evolution 'may' get better chrec(POLYNOMIAL_CHREC), inner one may get other scev (e.g. conversion). I'm even wondering this recursive is intended :). It may help to handle the chick-egg issue(wrap vs. niter). > >> To handle this issue, disabling flag_aggressive_loop_optimizations >> inside number_of_latch_executions is one method. >> To avoid the issue in other cases, e.g. the call cycle starts from >> number_of_iterations_exit or number_of_iterations_exit_assumptions, >> this patch disable flag_aggressive_loop_optimizations inside >> number_of_iterations_exit_assumptions. > > But disabling flag_aggressive_loop_optimizations is a very > non-intuitive way of avoiding recursive calls. I'd rather > avoid those in a similar way estimate_numbers_of_iterations does, > for example with > > diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c > index 61d72c278a1..cc1e510b6c2 100644 > --- a/gcc/tree-scalar-evolution.c > +++ b/gcc/tree-scalar-evolution.c > @@ -2807,7 +2807,7 @@ number_of_latch_executions (class loop *loop) > if (dump_file && (dump_flags & TDF_SCEV)) > fprintf (dump_file, "(number_of_iterations_in_loop = \n"); > > - res = chrec_dont_know; > + loop->nb_iterations = res = chrec_dont_know; > exit = single_exit (loop); > > if (exit && number_of_iterations_exit (loop, exit, &niter_desc, false)) > > though this doesn't seem to improve the SCEV analysis with your > testcase. Alternatively one could more conciously compute an > "estimated" estimate like with > > diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c > index 61d72c278a1..8529c44d574 100644 > --- a/gcc/tree-scalar-evolution.c > +++ b/gcc/tree-scalar-evolution.c > @@ -2802,6 +2802,19 @@ number_of_latch_executions (class loop *loop) > if (res) > return res; > > + /* When estimates for this loop are not computed yet compute them > + without resorting to niter analysis results which means w/o > + flag_aggressive_loop_optimizations. */ > + if (loop->estimate_state == EST_NOT_COMPUTED) > + { > + bool saved_flag_aggressive_loop_optimizations > + = flag_aggressive_loop_optimizations; > + flag_aggressive_loop_optimizations = false; > + estimate_numbers_of_iterations (loop); > + flag_aggressive_loop_optimizations > + = saved_flag_aggressive_loop_optimizations; > + } > + > may_be_zero = NULL_TREE; > > but that also doesn't change your testcase for me. Applying your > patch does the trick but then I really wonder why. > > (get_scalar_evolution > (scalar = lv_10) > (scalar_evolution = {(int) start_7(D), +, 1}_1)) > > from > > <bb 3> [local count: 955630225]: > # i_15 = PHI <i_12(6), start_7(D)(5)> > lv_10 = (int) i_15; > i.1_3 = (unsigned short) i_15; > _4 = i.1_3 + 1; > i_12 = (short int) _4; > > where the 'i' IV can wrap when start >= end but that's ruled out > by the entry test. The scalar evolution for lv_10 would be > wrong if we didn't conclude that so I guess this optimization > is provided by the estimate somehow. In the recursive chain: loop_distribution->number_of_latch_executions analyze_scalar_evolution(i_15,1)->follow_ssa_edge_expr->chrec_convert ->loop_exits_before_overflow(1st)->estimate_numbers_of_iterations(1st) ->analyze_scalar_evolution(i_15,2)->chrec_convert-> loop_exits_before_overflow(2nd)->estimate_numbers_of_iterations(2nd) 1. Because estimate_numbers_of_iterations(2nd) returns quickly, then loop_exits_before_overflow(2nd) returns false; and then the inner analyze_scalar_evolution(i_15,2) gets "(short int) {(unsigned short) start_7(D), +, 1}_1" for i_15. 2. When call stack back to loop_exits_before_overflow(1st), there is some loop info (e.g. control_ivs) ready to check. It confirms no overflows on the subject. And when back to analyze_scalar_evolution(i_15,1), it gets "{()start_7(D), +, 1}_1" for i_15. 3. If flag_aggressive_loop_optimizations is on, lv_10's scev is computed in 'estimate_numbers_of_iterations(1st)' through function 'infer_loop_bounds_from_undefined'. At that time, i_15's scev is still '(short int) {(unsigned short) start_7(D), +, 1}_1'. And then "(int) (short int) {(unsigned short) start_7(D), +, 1}_1" is cached for lv_10=(int)i_15. > > I also tried > > diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c > index 935d2d4d8f3..d1787ba39b6 100644 > --- a/gcc/tree-ssa-loop-ivopts.c > +++ b/gcc/tree-ssa-loop-ivopts.c > @@ -8093,6 +8093,8 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, > class loop *loop, > fprintf (dump_file, "\n"); > } > > + estimate_numbers_of_iterations (loop); > + > body = get_loop_body (loop); > data->body_includes_call = loop_body_includes_call (body, > loop->num_nodes); > renumber_gimple_stmt_uids_in_blocks (body, loop->num_nodes); > > to get into the cycles from the "correct" entry but that does > not help either. Nor does > > diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c > index b767056aeb0..f058be04836 100644 > --- a/gcc/tree-ssa-loop-niter.c > +++ b/gcc/tree-ssa-loop-niter.c > @@ -2534,6 +2534,14 @@ number_of_iterations_exit_assumptions (class loop > *loop, edge exit, > && !POINTER_TYPE_P (type)) > return false; > > + if (loop->estimate_state == EST_NOT_COMPUTED) > + { > + bool saved = flag_aggressive_loop_optimizations; > + flag_aggressive_loop_optimizations = false; > + estimate_numbers_of_iterations (loop); > + flag_aggressive_loop_optimizations = saved; > + } > + > tree iv0_niters = NULL_TREE; > if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt), > op0, &iv0, safe ? &iv0_niters : NULL, > false)) > > or > > diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c > index 61d72c278a1..d1af89eb459 100644 > --- a/gcc/tree-scalar-evolution.c > +++ b/gcc/tree-scalar-evolution.c > @@ -518,7 +518,8 @@ set_scalar_evolution (basic_block instantiated_below, > tree scalar, tree chrec) > nb_set_scev++; > } > > - *scalar_info = chrec; > + if (*scalar_info == chrec_not_analyzed_yet) > + *scalar_info = chrec; > } > > /* Retrieve the chrec associated to SCALAR instantiated below > > That said, having the cycles is bad, we should definitively break > them (if only for compile-time reasons). But I don't really > understand how your patch helps and my attempts do not which > means I am missing a critical piece of understanding ... :/ This patch disables flag_aggressive_loop_optimizations before analyze_scalar_evolution is called, then lv_10's scev is not computed during this call cycle. lv_10's scev is calculated when it other passes (e.g. ivopt) request, at that time i_15 has 'final' scev. I had also tried to disable recursive from analyze_scalar_evolution on the same ssa name(i_15), and directly get the final result. But I'm not finding a good way yet. Again thanks for your suggestions! Thanks! Jiufu > > Thanks, > Richard. > >> Thanks again. >> >> BR, >> Jiufu >> >> > to SCEV and thus may recurse again - to me it would be more >> > logical to try avoid recursing in number_of_latch_executions by >> > setting ->nb_iterations to something early, maybe chrec_dont_know, >> > to signal we're using something we're just trying to compute. >> > >> > Richard. >> > >> >> BR, >> >> Jiufu >> >> >> >> gcc/ChangeLog: >> >> >> >> * tree-ssa-loop-niter.c (number_of_iterations_exit_assumptions): >> >> Disable/restore flag_aggressive_loop_optimizations. >> >> >> >> gcc/testsuite/ChangeLog: >> >> >> >> * gcc.dg/tree-ssa/scev-16.c: New test. >> >> >> >> --- >> >> gcc/tree-ssa-loop-niter.c | 23 +++++++++++++++++++---- >> >> gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 20 ++++++++++++++++++++ >> >> 2 files changed, 39 insertions(+), 4 deletions(-) >> >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c >> >> >> >> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c >> >> index 06954e437f5..51bb501019e 100644 >> >> --- a/gcc/tree-ssa-loop-niter.c >> >> +++ b/gcc/tree-ssa-loop-niter.c >> >> @@ -2534,18 +2534,31 @@ number_of_iterations_exit_assumptions (class loop >> >> *loop, edge exit, >> >> && !POINTER_TYPE_P (type)) >> >> return false; >> >> >> >> + /* Before niter is calculated, avoid to analyze interim state. */ >> >> + int old_aggressive_loop_optimizations = >> >> flag_aggressive_loop_optimizations; >> >> + flag_aggressive_loop_optimizations = 0; >> >> + >> >> tree iv0_niters = NULL_TREE; >> >> if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt), >> >> op0, &iv0, safe ? &iv0_niters : NULL, false)) >> >> - return number_of_iterations_popcount (loop, exit, code, niter); >> >> + { >> >> + bool res = number_of_iterations_popcount (loop, exit, code, niter); >> >> + flag_aggressive_loop_optimizations = >> >> old_aggressive_loop_optimizations; >> >> + return res; >> >> + } >> >> tree iv1_niters = NULL_TREE; >> >> if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt), >> >> op1, &iv1, safe ? &iv1_niters : NULL, false)) >> >> - return false; >> >> + { >> >> + flag_aggressive_loop_optimizations = >> >> old_aggressive_loop_optimizations; >> >> + return false; >> >> + } >> >> /* Give up on complicated case. */ >> >> if (iv0_niters && iv1_niters) >> >> - return false; >> >> - >> >> + { >> >> + flag_aggressive_loop_optimizations = >> >> old_aggressive_loop_optimizations; >> >> + return false; >> >> + } >> >> /* We don't want to see undefined signed overflow warnings while >> >> computing the number of iterations. */ >> >> fold_defer_overflow_warnings (); >> >> @@ -2565,6 +2578,7 @@ number_of_iterations_exit_assumptions (class loop >> >> *loop, edge exit, >> >> only_exit_p, safe)) >> >> { >> >> fold_undefer_and_ignore_overflow_warnings (); >> >> + flag_aggressive_loop_optimizations = >> >> old_aggressive_loop_optimizations; >> >> return false; >> >> } >> >> >> >> @@ -2608,6 +2622,7 @@ number_of_iterations_exit_assumptions (class loop >> >> *loop, edge exit, >> >> niter->may_be_zero); >> >> >> >> fold_undefer_and_ignore_overflow_warnings (); >> >> + flag_aggressive_loop_optimizations = old_aggressive_loop_optimizations; >> >> >> >> /* If NITER has simplified into a constant, update MAX. */ >> >> if (TREE_CODE (niter->niter) == INTEGER_CST) >> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c >> >> b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c >> >> new file mode 100644 >> >> index 00000000000..708ffab88ca >> >> --- /dev/null >> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c >> >> @@ -0,0 +1,20 @@ >> >> +/* { dg-do compile } */ >> >> +/* { dg-options "-O2 -fdump-tree-ivopts-scev" } */ >> >> + >> >> +/* Expect scalar_evolution = {(int) start_7(D), +, 1}_1), instead >> >> + (int) (short int) {(unsigned short) start_7(D), +, 1}_1 */ >> >> + >> >> +int arr[1000]; >> >> + >> >> +void >> >> +s2i (short start, short end) >> >> +{ >> >> + int res = 0; >> >> + for (short i = start; i < end; i++) >> >> + { >> >> + int lv = i; >> >> + arr[lv] += lv; >> >> + } >> >> +} >> >> + >> >> +/* { dg-final { scan-tree-dump-times "scalar_evolution = \{\\(int\\) >> >> start_\[0-9\]+\\(D\\), \\+, 1\}_1" 1 "ivopts" } } */ >> >> >> >>