On Tue, 25 Jan 2022, Richard Biener wrote: > On Tue, 25 Jan 2022, Jiufu Guo wrote: > > > Richard Biener <rguent...@suse.de> writes: > > > > > On Tue, 25 Jan 2022, Jiufu Guo wrote: > > > > > >> Jiufu Guo <guoji...@linux.ibm.com> writes: > > >> > > >> > Richard Biener <rguent...@suse.de> writes: > > >> > > > >> >> On Thu, 13 Jan 2022, Jiufu Guo wrote: > > >> > ... > > >> >> > > >> >>> - /* No need to check sign of the new step since below code > > >> >>> takes care > > >> >>> - of this well. */ > > >> >>> + /* Like cases shown in PR100740/102131, negtive step is not > > >> >>> safe. */ > > >> >>> + if (tree_int_cst_sign_bit (step)) > > >> >>> + return false; > > >> >>> + > > >> >>> if (code != NE_EXPR > > >> >>> && (TREE_CODE (step) != INTEGER_CST > > >> >>> || !iv0->no_overflow || !iv1->no_overflow)) > > >> >> > > >> >> I think for NE_EXPR the sign is not relevant. I think the key is > > >> >> that we require that iv0->no_overflow and for non-equality checks > > >> >> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that > > >> >> does not cause any overflow on either side. On the LHS this is > > >> > > > >> > Hi Richard, > > >> > > > >> > Thanks a lot for your comments and ideas! > > >> > > > >> > Right! The adjusting is safe only if we can make sure there is > > >> > no overflow/wrap on either side or say there is no overflow/wrap > > >> > on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}. > > Hi Richard, > > > > Thanks for your quickly reply, and patient review! > > > > > > The point is that we may not change the iteration number at which > > > overflow occurs since that alters the result of the < compare. > > > Only if we know there is no overflow with the present expression > > > during the loop evaluation we can do the transform and then only > > > if we do not introduce overflow. > > Exactly, this is also what I mean :) > > > > > > > > We are basically doing the transform that fold_comparison > > > in fold-const.cc does: > > > > > > /* Transform comparisons of the form X +- C1 CMP Y +- C2 to > > > X CMP Y +- C2 +- C1 for signed X, Y. This is valid if > > > the resulting offset is smaller in absolute value than the > > > original one and has the same sign. */ > > > if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0)) > > > && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)) > > > ... > > > > > This transform seems not the scenario which we are care about in > > number_of_iterations_cond. > > For example, 'X + 1 < Y + 4' ==> 'X < Y + 3' would be correct if > > no overflow happen. > > But for loop, the niter for '{X, 1} < {Y, 4}' would be totally > > different with niter for '{X, 0} < {Y, 3}'. > > for (iv0 = X, iv1 = Y; iv0 < iv1; iv0 += 1, iv1 += 4) > > in this loop, iv1 walks to overflow faster, step is 4. > > vs. > > for (iv0 = X, iv1 = Y; iv0 < iv1; iv1 += 3) (iv1 overflow slow) > > in this loop, iv1 overflows slower, step is 3. > > Huh? But we are _exactly_ doing this, analyzing {X, + 1} < {Y, + 4} > as X < {Y, + 3} (well, OK, we're only trying {X, -3} which now > fails - we should try the other way around as well). > > > Actually, the transformation 'X + 1 < Y + 4' ==> 'X < Y + 3', > > may not always correct. e.g. for below code, X=6, and Y=2147483645 > > it may output "b0 + 1 < b1 + 4 is true". > > But Y + 4 overflows with 2147483645 so X + 1 < Y + 4 invokes UB > and we can ignore this situation. > > > ```c > > int __attribute__ ((noinline)) > > foo (int b0, int b1) > > { > > return __builtin_printf ("b0 + 1 < b1 + 4 is %s\n", > > b0 + 1 < b1 + 4 ? "true" : "false"); > > } > > > > int main(int argc, char **argv) > > { > > if (argc < 2) > > return -1; > > int b0 = atoi(argv[1]); > > int b1 = atoi(argv[2]); > > return foo (b0, b1); > > } > > ``` > > >> > Or it may also ok if we can compute an assumption, under which > > >> > all three ivs are not overflowed/wrapped. > > >> > > > >> >> only guaranteed if the absolute value of C1 - C2 is smaller than > > >> >> that of C1 and if it has the same sign. > > >> > I'm thinking this in another way: > > >> > When trying to do this transform in number_of_iterations_cond, > > >> > GT/GE is inverted to LT/LE, then the compare code would be: > > >> > LT/LE or NE. > > >> > > > >> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is > > >> > chasing iv1. We would able to assume X < Y (may_be_zero would > > >> > be set later via number_of_iterations_lt/le). > > >> > 1. If C1 < C2, iv0 can never catch up iv1. For examples: > > >> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}. > > >> > And there must be at least one overflow/wrap in iv0,iv1, or iv. > > >> > This indicates, if the sign of (C1 - C1) is negative, then the > > >> > transform would be incorrect. > > >> > 2. If C1 > C2, we still need to make sure all the ivs (iv0, > > >> > iv1 and combined iv) are not wrapped. > > >> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up. > > >> > the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1) > > >> There is still some cases: iv0 step is too large, then iv0 wraps > > >> first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption > > >> would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1). > > >> > > >> > For C1 < 0, {X,C1} should not down cross MIN > > >> > the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1) > > >> Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1) > > >> > > >> > For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other, > > >> > it would be almost safe. > > >> For this case, we may still add the assumption to avoid wraping > > >> at the first iteration. > > >> > > >> BR, > > >> Jiufu > > >> > > >> > > > >> > For NE, it seems more interesting. The transformation depends > > >> > on 3 things: 1. the relation between X and Y; 2 the sign > > >> > of (C1-C2); 3. if iv0 and iv1 can be equal finally. > > >> > The 3rd one may be more special. > > >> > The good news is, number_of_iterations_ne seems able to handle NE. > > >> > > > >> >> > > >> >> With the same reasoning we then know the new IV0 doesn't overflow. > > >> >> > > >> >> So something like the following. IIRC I've proposed sth similar > > >> >> a while back. I'm going to give it some testing, collecting > > >> >> testcases from related PRs. > > >> >> > > >> >> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc > > >> >> index b767056aeb0..74fa4f66ee2 100644 > > ... > > >> >> + if (TREE_CODE (step) != INTEGER_CST > > >> >> + || !iv0->no_overflow || !iv1->no_overflow) > > >> >> + { > > >> > I was also trying to leverage no_overflow of iv0 and iv1. While it > > >> > seems > > >> > the computation logic of no_overflow is related to the type of IV. If > > >> > the > > >> > type of IV is signed, the C semantics may be used, overflow in signed > > >> > IV are treated UB, and then no_overflow would be true. > > >> > > > >> > For unsigned IV, no_overflow would be false, even for the cases which > > >> > looks like: > > >> > "{10, 2} < {20, 1}", which would be ok to compute niter. > > > > > > IIRC no_overflow is determined by SCEV which might also use niter > > > analysis. For the case of {10, +2} < {20, +1} there is no need to > > > compute it as {10, +1} < 20 and we hopefully deal with this in > > > other code paths (in fact with base and step all constant we > > > can simply solve the linear equation for 'n' - maybe that's a > > > capability we should add to number_of_iterations_cond). > > > > Thanks for point this out. > > Yes, for const base(s) and step(s), we have other code path > > to deal with (e.g. loop_niter_by_eval). > > > > For {10, +2}, what I really mean is about the no_overflow iv(s) > > on unsigned. Sorry the misleading words. > > For no_overflow, it is set at some places, including > > number_of_iterations_xxx. :), Before number_of_iterations_xxx, > > no_overflow could be calculated in simple_iv_with_niters: > > ```c > > iv->no_overflow = !folded_casts && nowrap_type_p (type); > > ``` > > nowrap_type_p checks if overflow is UB on type through macro > > TYPE_OVERFLOW_UNDEFINED. For signed, it is UB; for unsigned > > it is different. > > Yes. > > > For example as below code, no_overflow is set as false for iv0 > > and iv1, and then niter was not computed quickly. > > ```c > > unsigned __attribute__ ((noinline)) > > foo (unsigned b0, unsigned b1) > > { > > unsigned n = 0; > > for (; b0 < b1; b0 += 2, b1 += 1) > > n++; > > return n; > > } > > There's no difference to before my patch of course. There are > some code paths in number_of_iterations_lt that use assumptions > to prove the combined IV does not wrap, just for this case > we give up too early. I'm currently looking at rectifying this > with small incremental changes.
Like the one below which should handle the PR81196 case when integer IVs are used. I suspect we can do something similar for IVs where we do not know the original overflow status (we have to register assumptions for both original IVs _and_ the new adjusted one). And as noted we can try rewriting to the other IV. I do wonder how important these are and what improvements we need to include in backports (I think we do want to fix the original issue on branches). Richard. >From f46855709dd45603d18f2dcd8403f5b060c164f0 Mon Sep 17 00:00:00 2001 From: Richard Biener <rguent...@suse.de> Date: Tue, 25 Jan 2022 13:11:57 +0100 Subject: [PATCH] tree-optimization/104214 - improve IV analysis with integer IV compares To: gcc-patches@gcc.gnu.org For rewriting BASE0 + STEP0 cmp BASE1 + STEP1 as BASE0 + STEP0 - STEP1 cmp BASE1 for signed integers we can use niter assumptions to ensure that BASE0 + STEP0 - STEP1 does not overflow instead of giving up when we cannot prove this statically. We can use the existing assert_no_overflow_lt for this and make it efficient for LE_EXPR also by rewriting LE_EXPR IV compares to LT_EXPR earlier. 2022-01-25 Richard Biener <rguent...@suse.de> PR tree-optimization/104214 * tree-ssa-loop-niter.cc (number_of_iterations_le): Refactor into ... (number_of_iterations_le_to_lt): ... this, just doing the iv->base rewriting and assumption registering. (number_of_iterations_cond): Rewrite LE_EXPR into LT_EXPR earlier. When rewriting BASE0 + STEP0 cmp BASE1 + STEP1 as BASE0 + STEP0 - STEP1 cmp BASE1 would fail for LT_EXPR because of possible overflow register assumptions instead. * gcc.dg/vect/pr81196-3.c: New testcase variant. --- gcc/testsuite/gcc.dg/vect/pr81196-3.c | 12 +++++ gcc/tree-ssa-loop-niter.cc | 67 +++++++++++++++------------ 2 files changed, 49 insertions(+), 30 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr81196-3.c diff --git a/gcc/testsuite/gcc.dg/vect/pr81196-3.c b/gcc/testsuite/gcc.dg/vect/pr81196-3.c new file mode 100644 index 00000000000..bcdd815dc5d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr81196-3.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +void b (int *p, int j, int k) +{ + p = (int *)__builtin_assume_aligned(p, __BIGGEST_ALIGNMENT__); + int i = 0; + for(; j < k; ++j, --k) + p[i++] = 1; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc index d33095b8e03..b5f3d4b4a8d 100644 --- a/gcc/tree-ssa-loop-niter.cc +++ b/gcc/tree-ssa-loop-niter.cc @@ -1721,17 +1721,17 @@ number_of_iterations_lt (class loop *loop, tree type, affine_iv *iv0, return true; } -/* Determines number of iterations of loop whose ending condition - is IV0 <= IV1. TYPE is the type of the iv. The number of - iterations is stored to NITER. EXIT_MUST_BE_TAKEN is true if - we know that this condition must eventually become false (we derived this +/* Rewrite the IV0 <= IV1 condition to IV0 < IV1 by adjusting one of + the IVs bases. TYPE is the type of the iv. Assumptions are + recorded to NITER. EXIT_MUST_BE_TAKEN is true if we know that this + condition must eventually become false (we derived this earlier, and possibly set NITER->assumptions to make sure this - is the case). BNDS bounds the difference IV1->base - IV0->base. */ + is the case). */ static bool -number_of_iterations_le (class loop *loop, tree type, affine_iv *iv0, - affine_iv *iv1, class tree_niter_desc *niter, - bool exit_must_be_taken, bounds *bnds) +number_of_iterations_le_to_lt (tree type, affine_iv *iv0, affine_iv *iv1, + class tree_niter_desc *niter, + bool exit_must_be_taken) { tree assumption; tree type1 = type; @@ -1777,10 +1777,7 @@ number_of_iterations_le (class loop *loop, tree type, affine_iv *iv0, iv0->base = fold_build2 (MINUS_EXPR, type1, iv0->base, build_int_cst (type1, 1)); - bounds_add (bnds, 1, type1); - - return number_of_iterations_lt (loop, type, iv0, iv1, niter, exit_must_be_taken, - bnds); + return true; } /* Dumps description of affine induction variable IV to FILE. */ @@ -1862,6 +1859,17 @@ number_of_iterations_cond (class loop *loop, code = swap_tree_comparison (code); } + /* If the loop exits immediately, there is nothing to do. */ + tree tem = fold_binary (code, boolean_type_node, iv0->base, iv1->base); + if (tem && integer_zerop (tem)) + { + if (!every_iteration) + return false; + niter->niter = build_int_cst (unsigned_type_for (type), 0); + niter->max = 0; + return true; + } + if (POINTER_TYPE_P (type)) { /* Comparison of pointers is undefined unless both iv0 and iv1 point @@ -1884,6 +1892,15 @@ number_of_iterations_cond (class loop *loop, exit_must_be_taken = true; } + /* Turn LE_EXPR to LT_EXPR, registering required assumptions. */ + if (code == LE_EXPR) + { + if (!number_of_iterations_le_to_lt (type, iv0, iv1, niter, + exit_must_be_taken)) + return false; + code = LT_EXPR; + } + /* We can handle cases which neither of the sides of the comparison is invariant: @@ -1928,15 +1945,21 @@ number_of_iterations_cond (class loop *loop, pointer compares, we also know the resulting IV does not overflow. */ ; - else if (code != NE_EXPR) - return false; else + /* For LT_EXPR we register the assumptions necessary for + the adjusted IV0 to not overflow. */ iv0->no_overflow = false; } iv0->step = step; iv1->step = build_int_cst (step_type, 0); iv1->no_overflow = true; + if (code == LT_EXPR && !iv0->no_overflow) + { + if (!assert_no_overflow_lt (type, iv0, iv1, niter, step)) + return false; + /* We will now have iv0->no_overflow == true again. */ + } } /* If the result of the comparison is a constant, the loop is weird. More @@ -1945,17 +1968,6 @@ number_of_iterations_cond (class loop *loop, if (integer_zerop (iv0->step) && integer_zerop (iv1->step)) return false; - /* If the loop exits immediately, there is nothing to do. */ - tree tem = fold_binary (code, boolean_type_node, iv0->base, iv1->base); - if (tem && integer_zerop (tem)) - { - if (!every_iteration) - return false; - niter->niter = build_int_cst (unsigned_type_for (type), 0); - niter->max = 0; - return true; - } - /* OK, now we know we have a senseful loop. Handle several cases, depending on what comparison operator is used. */ bound_difference (loop, iv1->base, iv0->base, &bnds); @@ -1994,11 +2006,6 @@ number_of_iterations_cond (class loop *loop, exit_must_be_taken, &bnds); break; - case LE_EXPR: - ret = number_of_iterations_le (loop, type, iv0, iv1, niter, - exit_must_be_taken, &bnds); - break; - default: gcc_unreachable (); } -- 2.31.1