On Tue, 25 Jan 2022, Richard Biener wrote:

> On Tue, 25 Jan 2022, Jiufu Guo wrote:
> 
> > Richard Biener <rguent...@suse.de> writes:
> > 
> > > On Tue, 25 Jan 2022, Jiufu Guo wrote:
> > >
> > >> Jiufu Guo <guoji...@linux.ibm.com> writes:
> > >> 
> > >> > Richard Biener <rguent...@suse.de> writes:
> > >> >
> > >> >> On Thu, 13 Jan 2022, Jiufu Guo wrote:
> > >> > ...
> > >> >>
> > >> >>> -      /* No need to check sign of the new step since below code 
> > >> >>> takes care
> > >> >>> -     of this well.  */
> > >> >>> +      /* Like cases shown in PR100740/102131, negtive step is not 
> > >> >>> safe.  */
> > >> >>> +      if (tree_int_cst_sign_bit (step))
> > >> >>> +    return false;
> > >> >>> +
> > >> >>>        if (code != NE_EXPR
> > >> >>>        && (TREE_CODE (step) != INTEGER_CST
> > >> >>>            || !iv0->no_overflow || !iv1->no_overflow))
> > >> >>
> > >> >> I think for NE_EXPR the sign is not relevant.  I think the key is
> > >> >> that we require that iv0->no_overflow and for non-equality checks
> > >> >> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
> > >> >> does not cause any overflow on either side.  On the LHS this is
> > >> >
> > >> > Hi Richard,
> > >> >
> > >> > Thanks a lot for your comments and ideas!
> > >> >
> > >> > Right! The adjusting is safe only if we can make sure there is
> > >> > no overflow/wrap on either side or say there is no overflow/wrap
> > >> > on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.
> > Hi Richard,
> > 
> > Thanks for your quickly reply, and patient review!
> > >
> > > The point is that we may not change the iteration number at which
> > > overflow occurs since that alters the result of the < compare.
> > > Only if we know there is no overflow with the present expression
> > > during the loop evaluation we can do the transform and then only
> > > if we do not introduce overflow.
> > Exactly, this is also what I mean :)
> > 
> > >
> > > We are basically doing the transform that fold_comparison
> > > in fold-const.cc does:
> > >
> > >   /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
> > >      X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
> > >      the resulting offset is smaller in absolute value than the
> > >      original one and has the same sign.  */
> > >   if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
> > >       && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
> > > ...
> > >
> > This transform seems not the scenario which we are care about in
> > number_of_iterations_cond.
> > For example, 'X + 1 < Y + 4' ==> 'X < Y + 3' would be correct if
> > no overflow happen.
> > But for loop, the niter for '{X, 1} < {Y, 4}' would be totally
> > different with niter for '{X, 0} < {Y, 3}'.
> > for (iv0 = X, iv1 = Y; iv0 < iv1; iv0 += 1, iv1 += 4)
> >    in this loop, iv1 walks to overflow faster, step is 4.
> > vs.
> > for (iv0 = X, iv1 = Y; iv0 < iv1; iv1 += 3) (iv1 overflow slow)
> >    in this loop, iv1 overflows slower, step is 3.
> 
> Huh?  But we are _exactly_ doing this, analyzing {X, + 1} < {Y, + 4}
> as X < {Y, + 3} (well, OK, we're only trying {X, -3} which now
> fails - we should try the other way around as well).
> 
> > Actually, the transformation 'X + 1 < Y + 4' ==> 'X < Y + 3',
> > may not always correct.  e.g. for below code, X=6, and Y=2147483645
> > it may output "b0 + 1 < b1 + 4 is true".
> 
> But Y + 4 overflows with 2147483645 so X + 1 < Y + 4 invokes UB
> and we can ignore this situation.
> 
> > ```c
> > int __attribute__ ((noinline))
> > foo (int b0, int b1)
> > {
> >   return __builtin_printf ("b0 + 1 < b1 + 4 is %s\n",
> >                        b0 + 1 < b1 + 4 ? "true" : "false");
> > }
> > 
> > int main(int argc, char **argv)
> > {
> >   if (argc < 2)
> >     return -1;
> >   int b0 = atoi(argv[1]);
> >   int b1 = atoi(argv[2]);  
> >   return foo (b0, b1);
> > }
> > ```
> > >> > Or it may also ok if we can compute an assumption, under which
> > >> > all three ivs are not overflowed/wrapped.
> > >> >
> > >> >> only guaranteed if the absolute value of C1 - C2 is smaller than
> > >> >> that of C1 and if it has the same sign.
> > >> > I'm thinking this in another way:
> > >> > When trying to do this transform in number_of_iterations_cond,
> > >> > GT/GE is inverted to LT/LE, then the compare code would be:
> > >> > LT/LE or NE.
> > >> >
> > >> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
> > >> > chasing iv1.  We would able to assume X < Y (may_be_zero would
> > >> > be set later via number_of_iterations_lt/le).
> > >> > 1. If C1 < C2, iv0 can never catch up iv1. For examples:
> > >> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
> > >> > And there must be at least one overflow/wrap in iv0,iv1, or iv.
> > >> > This indicates, if the sign of (C1 - C1) is negative, then the
> > >> > transform would be incorrect.
> > >> > 2. If C1 > C2, we still need to make sure all the ivs (iv0,
> > >> > iv1 and combined iv) are not wrapped.
> > >> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
> > >> >    the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
> > >> There is still some cases: iv0 step is too large, then iv0 wraps
> > >> first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
> > >> would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).
> > >> 
> > >> > For C1 < 0, {X,C1} should not down cross MIN
> > >> >    the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
> > >>   Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1)
> > >> 
> > >> > For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
> > >> > it would be almost safe.
> > >> For this case, we may still add the assumption to avoid wraping
> > >> at the first iteration.
> > >> 
> > >> BR,
> > >> Jiufu
> > >> 
> > >> >
> > >> > For NE, it seems more interesting. The transformation depends
> > >> > on 3 things: 1. the relation between X and Y; 2 the sign
> > >> > of (C1-C2); 3. if iv0 and iv1 can be equal finally.
> > >> > The 3rd one may be more special.
> > >> > The good news is, number_of_iterations_ne seems able to handle NE.
> > >> >
> > >> >>
> > >> >> With the same reasoning we then know the new IV0 doesn't overflow.
> > >> >>
> > >> >> So something like the following.  IIRC I've proposed sth similar
> > >> >> a while back.  I'm going to give it some testing, collecting
> > >> >> testcases from related PRs.
> > >> >>
> > >> >> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > >> >> index b767056aeb0..74fa4f66ee2 100644
> > ...
> > >> >> +      if (TREE_CODE (step) != INTEGER_CST
> > >> >> +         || !iv0->no_overflow || !iv1->no_overflow)
> > >> >> +       {
> > >> > I was also trying to leverage no_overflow of iv0 and iv1. While it 
> > >> > seems
> > >> > the computation logic of no_overflow is related to the type of IV.  If 
> > >> > the
> > >> > type of IV is signed, the C semantics may be used, overflow in signed
> > >> > IV are treated UB, and then no_overflow would be true.
> > >> >
> > >> > For unsigned IV, no_overflow would be false, even for the cases which
> > >> > looks like:
> > >> > "{10, 2} < {20, 1}", which would be ok to compute niter.
> > >
> > > IIRC no_overflow is determined by SCEV which might also use niter
> > > analysis.  For the case of {10, +2} < {20, +1} there is no need to
> > > compute it as {10, +1} < 20 and we hopefully deal with this in
> > > other code paths (in fact with base and step all constant we
> > > can simply solve the linear equation for 'n' - maybe that's a
> > > capability we should add to number_of_iterations_cond).
> > 
> > Thanks for point this out.
> > Yes, for const base(s) and step(s), we have other code path
> > to deal with (e.g. loop_niter_by_eval).
> > 
> > For {10, +2}, what I really mean is about the no_overflow iv(s)
> > on unsigned.  Sorry the misleading words.
> > For no_overflow, it is set at some places, including
> > number_of_iterations_xxx. :),  Before number_of_iterations_xxx,
> > no_overflow could be calculated in simple_iv_with_niters:
> > ```c
> >   iv->no_overflow = !folded_casts && nowrap_type_p (type);
> > ```
> > nowrap_type_p checks if overflow is UB on type through macro
> > TYPE_OVERFLOW_UNDEFINED.  For signed, it is UB; for unsigned
> > it is different.
> 
> Yes.
> 
> > For example as below code, no_overflow is set as false for iv0
> > and iv1, and then niter was not computed quickly.
> > ```c
> > unsigned __attribute__ ((noinline))
> > foo (unsigned b0, unsigned b1)
> > {
> >   unsigned n = 0;
> >   for (; b0 < b1; b0 += 2, b1 += 1)
> >     n++;
> >   return n;
> > }
> 
> There's no difference to before my patch of course.  There are
> some code paths in number_of_iterations_lt that use assumptions
> to prove the combined IV does not wrap, just for this case
> we give up too early.  I'm currently looking at rectifying this
> with small incremental changes.

Like the one below which should handle the PR81196 case when
integer IVs are used.  I suspect we can do something similar
for IVs where we do not know the original overflow status
(we have to register assumptions for both original IVs _and_
the new adjusted one).

And as noted we can try rewriting to the other IV.

I do wonder how important these are and what improvements we need
to include in backports (I think we do want to fix the original issue
on branches).

Richard.

>From f46855709dd45603d18f2dcd8403f5b060c164f0 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguent...@suse.de>
Date: Tue, 25 Jan 2022 13:11:57 +0100
Subject: [PATCH] tree-optimization/104214 - improve IV analysis with integer
 IV compares
To: gcc-patches@gcc.gnu.org

For rewriting BASE0 + STEP0 cmp BASE1 + STEP1 as
BASE0 + STEP0 - STEP1 cmp BASE1 for signed integers we can use
niter assumptions to ensure that BASE0 + STEP0 - STEP1 does not
overflow instead of giving up when we cannot prove this
statically.

We can use the existing assert_no_overflow_lt for this and make it
efficient for LE_EXPR also by rewriting LE_EXPR IV compares to LT_EXPR earlier.

2022-01-25  Richard Biener  <rguent...@suse.de>

        PR tree-optimization/104214
        * tree-ssa-loop-niter.cc (number_of_iterations_le): Refactor
        into ...
        (number_of_iterations_le_to_lt): ... this, just doing
        the iv->base rewriting and assumption registering.
        (number_of_iterations_cond): Rewrite LE_EXPR into LT_EXPR
        earlier.  When rewriting BASE0 + STEP0 cmp BASE1 + STEP1
        as BASE0 + STEP0 - STEP1 cmp BASE1 would fail for LT_EXPR
        because of possible overflow register assumptions instead.

        * gcc.dg/vect/pr81196-3.c: New testcase variant.
---
 gcc/testsuite/gcc.dg/vect/pr81196-3.c | 12 +++++
 gcc/tree-ssa-loop-niter.cc            | 67 +++++++++++++++------------
 2 files changed, 49 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr81196-3.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr81196-3.c 
b/gcc/testsuite/gcc.dg/vect/pr81196-3.c
new file mode 100644
index 00000000000..bcdd815dc5d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr81196-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+void b (int *p, int j, int k)
+{
+  p = (int *)__builtin_assume_aligned(p, __BIGGEST_ALIGNMENT__);
+  int i = 0;
+  for(; j < k; ++j, --k)
+    p[i++] = 1;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index d33095b8e03..b5f3d4b4a8d 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -1721,17 +1721,17 @@ number_of_iterations_lt (class loop *loop, tree type, 
affine_iv *iv0,
   return true;
 }
 
-/* Determines number of iterations of loop whose ending condition
-   is IV0 <= IV1.  TYPE is the type of the iv.  The number of
-   iterations is stored to NITER.  EXIT_MUST_BE_TAKEN is true if
-   we know that this condition must eventually become false (we derived this
+/* Rewrite the IV0 <= IV1 condition to IV0 < IV1 by adjusting one of
+   the IVs bases.  TYPE is the type of the iv.  Assumptions are
+   recorded to NITER.  EXIT_MUST_BE_TAKEN is true if we know that this
+   condition must eventually become false (we derived this
    earlier, and possibly set NITER->assumptions to make sure this
-   is the case).  BNDS bounds the difference IV1->base - IV0->base.  */
+   is the case).  */
 
 static bool
-number_of_iterations_le (class loop *loop, tree type, affine_iv *iv0,
-                        affine_iv *iv1, class tree_niter_desc *niter,
-                        bool exit_must_be_taken, bounds *bnds)
+number_of_iterations_le_to_lt (tree type, affine_iv *iv0, affine_iv *iv1,
+                              class tree_niter_desc *niter,
+                              bool exit_must_be_taken)
 {
   tree assumption;
   tree type1 = type;
@@ -1777,10 +1777,7 @@ number_of_iterations_le (class loop *loop, tree type, 
affine_iv *iv0,
     iv0->base = fold_build2 (MINUS_EXPR, type1,
                             iv0->base, build_int_cst (type1, 1));
 
-  bounds_add (bnds, 1, type1);
-
-  return number_of_iterations_lt (loop, type, iv0, iv1, niter, 
exit_must_be_taken,
-                                 bnds);
+  return true;
 }
 
 /* Dumps description of affine induction variable IV to FILE.  */
@@ -1862,6 +1859,17 @@ number_of_iterations_cond (class loop *loop,
       code = swap_tree_comparison (code);
     }
 
+  /* If the loop exits immediately, there is nothing to do.  */
+  tree tem = fold_binary (code, boolean_type_node, iv0->base, iv1->base);
+  if (tem && integer_zerop (tem))
+    {
+      if (!every_iteration)
+       return false;
+      niter->niter = build_int_cst (unsigned_type_for (type), 0);
+      niter->max = 0;
+      return true;
+    }
+
   if (POINTER_TYPE_P (type))
     {
       /* Comparison of pointers is undefined unless both iv0 and iv1 point
@@ -1884,6 +1892,15 @@ number_of_iterations_cond (class loop *loop,
        exit_must_be_taken = true;
     }
 
+  /* Turn LE_EXPR to LT_EXPR, registering required assumptions.  */
+  if (code == LE_EXPR)
+    {
+      if (!number_of_iterations_le_to_lt (type, iv0, iv1, niter,
+                                         exit_must_be_taken))
+       return false;
+      code = LT_EXPR;
+    }
+
   /* We can handle cases which neither of the sides of the comparison is
      invariant:
 
@@ -1928,15 +1945,21 @@ number_of_iterations_cond (class loop *loop,
               pointer compares, we also know the resulting IV does not
               overflow.  */
            ;
-         else if (code != NE_EXPR)
-           return false;
          else
+           /* For LT_EXPR we register the assumptions necessary for
+              the adjusted IV0 to not overflow.  */
            iv0->no_overflow = false;
        }
 
       iv0->step = step;
       iv1->step = build_int_cst (step_type, 0);
       iv1->no_overflow = true;
+      if (code == LT_EXPR && !iv0->no_overflow)
+       {
+         if (!assert_no_overflow_lt (type, iv0, iv1, niter, step))
+           return false;
+         /* We will now have iv0->no_overflow == true again.  */
+       }
     }
 
   /* If the result of the comparison is a constant,  the loop is weird.  More
@@ -1945,17 +1968,6 @@ number_of_iterations_cond (class loop *loop,
   if (integer_zerop (iv0->step) && integer_zerop (iv1->step))
     return false;
 
-  /* If the loop exits immediately, there is nothing to do.  */
-  tree tem = fold_binary (code, boolean_type_node, iv0->base, iv1->base);
-  if (tem && integer_zerop (tem))
-    {
-      if (!every_iteration)
-       return false;
-      niter->niter = build_int_cst (unsigned_type_for (type), 0);
-      niter->max = 0;
-      return true;
-    }
-
   /* OK, now we know we have a senseful loop.  Handle several cases, depending
      on what comparison operator is used.  */
   bound_difference (loop, iv1->base, iv0->base, &bnds);
@@ -1994,11 +2006,6 @@ number_of_iterations_cond (class loop *loop,
                                     exit_must_be_taken, &bnds);
       break;
 
-    case LE_EXPR:
-      ret = number_of_iterations_le (loop, type, iv0, iv1, niter,
-                                    exit_must_be_taken, &bnds);
-      break;
-
     default:
       gcc_unreachable ();
     }
-- 
2.31.1

Reply via email to