Re: [PATCH] Tree-level fix for PR 69526

Richard Biener Fri, 25 Nov 2016 05:47:59 -0800

On Wed, Nov 16, 2016 at 4:54 PM, Robin Dapp <rd...@linux.vnet.ibm.com> wrote:
> Found some time to look into this again.
>
>> Index: tree-ssa-propagate.c
>> ===================================================================
>> --- tree-ssa-propagate.c        (revision 240133)
>> +++ tree-ssa-propagate.c        (working copy)
>> @@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker::before_d
>>        /* Replace real uses in the statement.  */
>>        did_replace |= replace_uses_in (stmt, get_value_fn);
>>
>> -      /* If we made a replacement, fold the statement.  */
>> -      if (did_replace)
>> +      /* Fold the statement.  */
>> +      if (fold_stmt (&i, follow_single_use_edges))
>>         {
>> -         fold_stmt (&i, follow_single_use_edges);
>> +         did_replace = true;
>>           stmt = gsi_stmt (i);
>>         }
>>
>> this would need compile-time cost evaluation (and avoid the tree-vrp.c
>> folding part
>> of your patch).
>
> This causes an ICE and bootstrap errors with newer revisions. I tested
> my patch on r240691 where it still works. How can this be done properly now?


Not sure, I'd have to investigate.  It should still just work
(throwing off a bootstrap
with just that change over the weekend).

>> OTOH given that you use this to decide whether you can use a combined 
>> constant
>> that doesn't look necessary (if the constant is correct, that is) --
>> what you need
>> to make sure is that the final operation, (T)(A) +- CST, does not overflow
>> if ! TYPE_OVERFLOW_WRAPS and there wasn't any overflow in the
>> original operation.  I think that always holds, thus the combine_ovf check 
>> looks
>> superfluous to me.
>
> Removed the check and addressed the other remarks.
>
>> So now we know that for (T)(X + CST1) + CST2, (T)CST1 + CST2
>> does not overflow.  But we do not really care for that, we want to know
>> whether (T)X + CST' might invoke undefined behavior when the original
>> expression did not.  This involves range information on X.  I don't
>> see how we ensure this here.
>
> I guess I'm still missing an undefined behavior case. In which case can
> (T)X + CST' trigger undefined behavior where the original statement did
> not? I see the need for checking in the second pattern ((T)(X) + CST' ->
> (T)(X + CST')), of course.

Looking at

+  /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+     (for inner_op (plus minus)
+       (simplify
+        (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+          (if (TREE_CODE (type) == INTEGER_TYPE &&
+               TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@3)))

so the conversion (T) is widening or sign-changing.  (&& go to the next line)

If A + CST overflows we cannot do the transform (you check that
with extract_range_from_binary_expr setting 'range_split').

If A + CST does not overflow but is unsigned and we are just changing sign
(precision ==) then (T)A + (CST + CST) might overflow.  Consider
(int)(INT_MAX + 1) + 1 -> INT_MAX + 2.  I think here the important part
is whether A + CST fits in T for the case where we just change the type
to a type with !TYPE_OVERFLOW_WRAPS.  Certainly restricting to
widenings would avoid the issue.

>> But that's "easily fixable" by computing it in unsigned arithmetic, that is
>> doing
>>
>>   (long)(a + 2) + LONG_MAX -> (long)((unsigned long)a + (LONG_MAX + 2))
>
> Does this also work if (unsigned long)a + (LONG_MAX + 2) does not fit
> into [0,LONG_MAX]? IIRC (unsigned long)(LONG_MAX + 2) is
> implementation-defined and not undefined so it should work?

Yes, implementation-defined beavior is fine.

> Revised patch version attached. One thing I'm still not sure about is
> the handling of sth. like (unsigned long)(a + UINT_MAX) + 1 for a = 0.
> In the current patch version I always do a sign-extend of the first
> constant (UINT_MAX here) which seems to cause no problems in the
> testsuite and some custom tests still worked.

+             /* Sign-extend @1 to TYPE.  */
+             w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED);

not sure why you do always sign-extend.  If the inner op is unsigned
and we widen then that's certainly bogus considering your UINT_MAX
example above.  Does

               w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));

not work for some reason?

+             /* Combine in outer, larger type.  */
+             bool combine_ovf = true;
+             combined_cst = wi::add (w1, w2, SIGNED, &combine_ovf);

as you ignore combine_ovf you can simply use

  combined_cst = wi::add (w1, w2);

+             /* Convert combined constant to tree of outer type if
+                there was no value range split in the original operation.  */
+             if (!range_split)
+               {

I'd say you want to condition on range_split early, like with

    bool range_split;
    if (TYPE_OVERFLOW_UNDEFINED (inner_type)
        || ! (extract_range_from_binary_expr (...., &range_split), range_split))
      {
...
      }

and avoid all the work if you throw it away anyway.

> Can UINT_MAX, -1 and
> similar cases be disambiguated (and correctly converted to the outer
> type) when done in unsigned arithmetic?

see above how I expect it to "just work".

>
> Thinking about the second pattern, on s390x it introduces more casts
> than just using the first one (e.g. in cases where the value will be
> sign-extended after the operation which wouldn't have happened when
> performing the operation in the larger type. Can we catch this
> with a cost function?

Not sure, if it's really too bad we can try merging the two patterns again,
thus have (T)(A +- CST) +- CST -> (T)(A +- CST) again.  Now that both
patterns look simple enough that should be possible without too much hassle.

+       (if (cst)
+        (outer_op (convert { @0; }) { cst; }))

you can write this as

    (if (cst)
     (outer_op (convert @0) { cst}))

no need for the {}s around @0.

+  /* ((T)(A)) +- CST -> (T)(A +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+    (simplify
+     (outer_op (convert @0) INTEGER_CST@2)
+      (if (TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+          && TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
+          && TREE_CODE (type) == INTEGER_TYPE)
+       /* Perform binary operation inside the cast if the constant fits
+         and there is no overflow.  */
+       (with
+       {
+         tree cst_inner;
+         bool range_split = true;
+
+         wide_int cst = @2;
+         cst_inner = wide_int_to_tree (TREE_TYPE (@0), cst);

not sure if that really does what you want (you're truncating the constant
to the smaller type).  I think you want to check

     int_fits_type_p (TREE_TYPE (@0), @2)

and then simply use

         tree cst_inner = fold_convert (TREE_TYPE (@0), @2);

as you do not allow a simple sign-change here if you merge the patterns
disallowing it in the above one would simplify things as well.

+         value_range vr = VR_INITIALIZER;
+         extract_range_from_binary_expr (&vr, outer_op, TREE_TYPE (@0),
+                                         @0, cst_inner, &range_split);


>
> On a side note: Can/should VRP infer ranges assuming no undefined
> behavior will take place when -fstrict-overflow is in use? I.e.
> inferring ~[INT_MIN,INT_MIN] for (long)(a - 1)? Would this even make sense?

It could do that but it does not at the moment.

Richard.

> Regards
>  Robin

Re: [PATCH] Tree-level fix for PR 69526

Reply via email to