Jiufu Guo <guoji...@linux.ibm.com> writes: > Richard Biener <rguent...@suse.de> writes: > >> On Thu, 13 Jan 2022, Jiufu Guo wrote: > ... >> >>> - /* No need to check sign of the new step since below code takes care >>> - of this well. */ >>> + /* Like cases shown in PR100740/102131, negtive step is not safe. */ >>> + if (tree_int_cst_sign_bit (step)) >>> + return false; >>> + >>> if (code != NE_EXPR >>> && (TREE_CODE (step) != INTEGER_CST >>> || !iv0->no_overflow || !iv1->no_overflow)) >> >> I think for NE_EXPR the sign is not relevant. I think the key is >> that we require that iv0->no_overflow and for non-equality checks >> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that >> does not cause any overflow on either side. On the LHS this is > > Hi Richard, > > Thanks a lot for your comments and ideas! > > Right! The adjusting is safe only if we can make sure there is > no overflow/wrap on either side or say there is no overflow/wrap > on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}. > > Or it may also ok if we can compute an assumption, under which > all three ivs are not overflowed/wrapped. > >> only guaranteed if the absolute value of C1 - C2 is smaller than >> that of C1 and if it has the same sign. > I'm thinking this in another way: > When trying to do this transform in number_of_iterations_cond, > GT/GE is inverted to LT/LE, then the compare code would be: > LT/LE or NE. > > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is > chasing iv1. We would able to assume X < Y (may_be_zero would > be set later via number_of_iterations_lt/le). > 1. If C1 < C2, iv0 can never catch up iv1. For examples: > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}. > And there must be at least one overflow/wrap in iv0,iv1, or iv. > This indicates, if the sign of (C1 - C1) is negative, then the > transform would be incorrect. > 2. If C1 > C2, we still need to make sure all the ivs (iv0, > iv1 and combined iv) are not wrapped. > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up. > the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1) There is still some cases: iv0 step is too large, then iv0 wraps first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).
> For C1 < 0, {X,C1} should not down cross MIN > the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1) Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1) > For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other, > it would be almost safe. For this case, we may still add the assumption to avoid wraping at the first iteration. BR, Jiufu > > For NE, it seems more interesting. The transformation depends > on 3 things: 1. the relation between X and Y; 2 the sign > of (C1-C2); 3. if iv0 and iv1 can be equal finally. > The 3rd one may be more special. > The good news is, number_of_iterations_ne seems able to handle NE. > >> >> With the same reasoning we then know the new IV0 doesn't overflow. >> >> So something like the following. IIRC I've proposed sth similar >> a while back. I'm going to give it some testing, collecting >> testcases from related PRs. >> >> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc >> index b767056aeb0..74fa4f66ee2 100644 >> --- a/gcc/tree-ssa-loop-niter.cc >> +++ b/gcc/tree-ssa-loop-niter.cc >> @@ -1890,17 +1890,28 @@ number_of_iterations_cond (class loop *loop, >> tree step = fold_binary_to_constant (MINUS_EXPR, step_type, >> iv0->step, iv1->step); >> >> - /* No need to check sign of the new step since below code takes >> care >> - of this well. */ >> - if (code != NE_EXPR >> - && (TREE_CODE (step) != INTEGER_CST >> - || !iv0->no_overflow || !iv1->no_overflow)) >> - return false; >> + /* For code other than NE_EXPR we have to ensure moving the >> evolution >> + of IV1 to that of IV0 does not introduce overflow. */ >> + if (TREE_CODE (step) != INTEGER_CST >> + || !iv0->no_overflow || !iv1->no_overflow) >> + { > I was also trying to leverage no_overflow of iv0 and iv1. While it seems > the computation logic of no_overflow is related to the type of IV. If the > type of IV is signed, the C semantics may be used, overflow in signed > IV are treated UB, and then no_overflow would be true. > > For unsigned IV, no_overflow would be false, even for the cases which > looks like: > "{10, 2} < {20, 1}", which would be ok to compute niter. > > BR, > Jiufu > >> + if (code != NE_EXPR) >> + return false; >> + iv0->no_overflow = false; >> + } >> + /* If the new step of IV0 has changed sign or is of greater >> + magnitude then we do not know whether IV0 does overflow >> + and thus the transform is not valid for code other than NE_EXPR >> */ >> + else if (tree_int_cst_sign_bit (step) != tree_int_cst_sign_bit >> (iv0->step) >> + || wi::gtu_p (wi::abs (wi::to_widest (step)), >> + wi::abs (wi::to_widest (iv0->step)))) >> + { >> + if (code != NE_EXPR) >> + return false; >> + iv0->no_overflow = false; >> + } >> >> iv0->step = step; >> - if (!POINTER_TYPE_P (type)) >> - iv0->no_overflow = false; >> - >> iv1->step = build_int_cst (step_type, 0); >> iv1->no_overflow = true; >> } >> >> >>> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr100740.c >>> b/gcc/testsuite/gcc.c-torture/execute/pr100740.c >>> new file mode 100644 >>> index 00000000000..381cdeb947a >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.c-torture/execute/pr100740.c >>> @@ -0,0 +1,13 @@ >>> +/* PR tree-optimization/100740 */ >>> + >>> +unsigned a, b; >>> +int >>> +main () >>> +{ >>> + unsigned c = 0; >>> + for (a = 0; a < 2; a++) >>> + for (b = 0; b < 2; b++) >>> + if (++c < a) >>> + __builtin_abort (); >>> + return 0; >>> +} >>>