Jiufu Guo <guoji...@linux.ibm.com> writes:

> Richard Biener <rguent...@suse.de> writes:
>
>> On Thu, 13 Jan 2022, Jiufu Guo wrote:
> ...
>>
>>> -      /* No need to check sign of the new step since below code takes care
>>> -    of this well.  */
>>> +      /* Like cases shown in PR100740/102131, negtive step is not safe.  */
>>> +      if (tree_int_cst_sign_bit (step))
>>> +   return false;
>>> +
>>>        if (code != NE_EXPR
>>>       && (TREE_CODE (step) != INTEGER_CST
>>>           || !iv0->no_overflow || !iv1->no_overflow))
>>
>> I think for NE_EXPR the sign is not relevant.  I think the key is
>> that we require that iv0->no_overflow and for non-equality checks
>> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
>> does not cause any overflow on either side.  On the LHS this is
>
> Hi Richard,
>
> Thanks a lot for your comments and ideas!
>
> Right! The adjusting is safe only if we can make sure there is
> no overflow/wrap on either side or say there is no overflow/wrap
> on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.
>
> Or it may also ok if we can compute an assumption, under which
> all three ivs are not overflowed/wrapped.
>
>> only guaranteed if the absolute value of C1 - C2 is smaller than
>> that of C1 and if it has the same sign.
> I'm thinking this in another way:
> When trying to do this transform in number_of_iterations_cond,
> GT/GE is inverted to LT/LE, then the compare code would be:
> LT/LE or NE.
>
> For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
> chasing iv1.  We would able to assume X < Y (may_be_zero would
> be set later via number_of_iterations_lt/le).
> 1. If C1 < C2, iv0 can never catch up iv1. For examples:
> {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
> And there must be at least one overflow/wrap in iv0,iv1, or iv.
> This indicates, if the sign of (C1 - C1) is negative, then the
> transform would be incorrect.
> 2. If C1 > C2, we still need to make sure all the ivs (iv0,
> iv1 and combined iv) are not wrapped.
> For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
>    the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
There is still some cases: iv0 step is too large, then iv0 wraps
first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).

> For C1 < 0, {X,C1} should not down cross MIN
>    the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
  Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1)

> For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
> it would be almost safe.
For this case, we may still add the assumption to avoid wraping
at the first iteration.

BR,
Jiufu

>
> For NE, it seems more interesting. The transformation depends
> on 3 things: 1. the relation between X and Y; 2 the sign
> of (C1-C2); 3. if iv0 and iv1 can be equal finally.
> The 3rd one may be more special.
> The good news is, number_of_iterations_ne seems able to handle NE.
>
>>
>> With the same reasoning we then know the new IV0 doesn't overflow.
>>
>> So something like the following.  IIRC I've proposed sth similar
>> a while back.  I'm going to give it some testing, collecting
>> testcases from related PRs.
>>
>> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
>> index b767056aeb0..74fa4f66ee2 100644
>> --- a/gcc/tree-ssa-loop-niter.cc
>> +++ b/gcc/tree-ssa-loop-niter.cc
>> @@ -1890,17 +1890,28 @@ number_of_iterations_cond (class loop *loop,
>>        tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
>>                                            iv0->step, iv1->step);
>>  
>> -      /* No need to check sign of the new step since below code takes 
>> care
>> -        of this well.  */
>> -      if (code != NE_EXPR
>> -         && (TREE_CODE (step) != INTEGER_CST
>> -             || !iv0->no_overflow || !iv1->no_overflow))
>> -       return false;
>> +      /* For code other than NE_EXPR we have to ensure moving the 
>> evolution
>> +        of IV1 to that of IV0 does not introduce overflow.  */
>> +      if (TREE_CODE (step) != INTEGER_CST
>> +         || !iv0->no_overflow || !iv1->no_overflow)
>> +       {
> I was also trying to leverage no_overflow of iv0 and iv1. While it seems
> the computation logic of no_overflow is related to the type of IV.  If the
> type of IV is signed, the C semantics may be used, overflow in signed
> IV are treated UB, and then no_overflow would be true.
>
> For unsigned IV, no_overflow would be false, even for the cases which
> looks like:
> "{10, 2} < {20, 1}", which would be ok to compute niter.
>
> BR,
> Jiufu
>
>> +         if (code != NE_EXPR)
>> +           return false;
>> +         iv0->no_overflow = false;
>> +       }
>> +      /* If the new step of IV0 has changed sign or is of greater
>> +        magnitude then we do not know whether IV0 does overflow
>> +        and thus the transform is not valid for code other than NE_EXPR  
>> */
>> +      else if (tree_int_cst_sign_bit (step) != tree_int_cst_sign_bit 
>> (iv0->step)
>> +              || wi::gtu_p (wi::abs (wi::to_widest (step)),
>> +                            wi::abs (wi::to_widest (iv0->step))))
>> +       {
>> +         if (code != NE_EXPR)
>> +           return false;
>> +         iv0->no_overflow = false;
>> +       }
>>  
>>        iv0->step = step;
>> -      if (!POINTER_TYPE_P (type))
>> -       iv0->no_overflow = false;
>> -
>>        iv1->step = build_int_cst (step_type, 0);
>>        iv1->no_overflow = true;
>>      }
>>
>>
>>> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr100740.c
>>> b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
>>> new file mode 100644
>>> index 00000000000..381cdeb947a
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.c-torture/execute/pr100740.c
>>> @@ -0,0 +1,13 @@
>>> +/* PR tree-optimization/100740 */
>>> +
>>> +unsigned a, b;
>>> +int
>>> +main ()
>>> +{
>>> +  unsigned c = 0;
>>> +  for (a = 0; a < 2; a++)
>>> +    for (b = 0; b < 2; b++)
>>> +      if (++c < a)
>>> +   __builtin_abort ();
>>> +  return 0;
>>> +}
>>> 

Reply via email to