On Fri, Nov 25, 2016 at 2:46 PM, Richard Biener <richard.guent...@gmail.com> wrote> On Wed, Nov 16, 2016 at 4:54 PM, Robin Dapp <rd...@linux.vnet.ibm.com> wrote: >> Found some time to look into this again. >> >>> Index: tree-ssa-propagate.c >>> =================================================================== >>> --- tree-ssa-propagate.c (revision 240133) >>> +++ tree-ssa-propagate.c (working copy) >>> @@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker::before_d >>> /* Replace real uses in the statement. */ >>> did_replace |= replace_uses_in (stmt, get_value_fn); >>> >>> - /* If we made a replacement, fold the statement. */ >>> - if (did_replace) >>> + /* Fold the statement. */ >>> + if (fold_stmt (&i, follow_single_use_edges)) >>> { >>> - fold_stmt (&i, follow_single_use_edges); >>> + did_replace = true; >>> stmt = gsi_stmt (i); >>> } >>> >>> this would need compile-time cost evaluation (and avoid the tree-vrp.c >>> folding part >>> of your patch). >> >> This causes an ICE and bootstrap errors with newer revisions. I tested >> my patch on r240691 where it still works. How can this be done properly now? > > Not sure, I'd have to investigate. It should still just work > (throwing off a bootstrap > with just that change over the weekend).
Index: gcc/tree-ssa-propagate.c =================================================================== --- gcc/tree-ssa-propagate.c (revision 242875) +++ gcc/tree-ssa-propagate.c (working copy) @@ -1065,13 +1065,15 @@ substitute_and_fold_dom_walker::before_d /* Replace real uses in the statement. */ did_replace |= replace_uses_in (stmt, get_value_fn); + if (did_replace) + gimple_set_modified (stmt, true); /* If we made a replacement, fold the statement. */ - if (did_replace) + if (fold_stmt (&i, follow_single_use_edges)) { - fold_stmt (&i, follow_single_use_edges); stmt = gsi_stmt (i); gimple_set_modified (stmt, true); + did_replace = true; } /* Some statements may be simplified using propagator works fine for me. >>> OTOH given that you use this to decide whether you can use a combined >>> constant >>> that doesn't look necessary (if the constant is correct, that is) -- >>> what you need >>> to make sure is that the final operation, (T)(A) +- CST, does not overflow >>> if ! TYPE_OVERFLOW_WRAPS and there wasn't any overflow in the >>> original operation. I think that always holds, thus the combine_ovf check >>> looks >>> superfluous to me. >> >> Removed the check and addressed the other remarks. >> >>> So now we know that for (T)(X + CST1) + CST2, (T)CST1 + CST2 >>> does not overflow. But we do not really care for that, we want to know >>> whether (T)X + CST' might invoke undefined behavior when the original >>> expression did not. This involves range information on X. I don't >>> see how we ensure this here. >> >> I guess I'm still missing an undefined behavior case. In which case can >> (T)X + CST' trigger undefined behavior where the original statement did >> not? I see the need for checking in the second pattern ((T)(X) + CST' -> >> (T)(X + CST')), of course. > > Looking at > > + /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST) */ > +#if GIMPLE > + (for outer_op (plus minus) > + (for inner_op (plus minus) > + (simplify > + (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2) > + (if (TREE_CODE (type) == INTEGER_TYPE && > + TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@3))) > > so the conversion (T) is widening or sign-changing. (&& go to the next line) > > If A + CST overflows we cannot do the transform (you check that > with extract_range_from_binary_expr setting 'range_split'). > > If A + CST does not overflow but is unsigned and we are just changing sign > (precision ==) then (T)A + (CST + CST) might overflow. Consider > (int)(INT_MAX + 1) + 1 -> INT_MAX + 2. I think here the important part > is whether A + CST fits in T for the case where we just change the type > to a type with !TYPE_OVERFLOW_WRAPS. Certainly restricting to > widenings would avoid the issue. > >>> But that's "easily fixable" by computing it in unsigned arithmetic, that is >>> doing >>> >>> (long)(a + 2) + LONG_MAX -> (long)((unsigned long)a + (LONG_MAX + 2)) >> >> Does this also work if (unsigned long)a + (LONG_MAX + 2) does not fit >> into [0,LONG_MAX]? IIRC (unsigned long)(LONG_MAX + 2) is >> implementation-defined and not undefined so it should work? > > Yes, implementation-defined beavior is fine. > >> Revised patch version attached. One thing I'm still not sure about is >> the handling of sth. like (unsigned long)(a + UINT_MAX) + 1 for a = 0. >> In the current patch version I always do a sign-extend of the first >> constant (UINT_MAX here) which seems to cause no problems in the >> testsuite and some custom tests still worked. > > + /* Sign-extend @1 to TYPE. */ > + w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED); > > not sure why you do always sign-extend. If the inner op is unsigned > and we widen then that's certainly bogus considering your UINT_MAX > example above. Does > > w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN > (inner_type)); > > not work for some reason? > > + /* Combine in outer, larger type. */ > + bool combine_ovf = true; > + combined_cst = wi::add (w1, w2, SIGNED, &combine_ovf); > > as you ignore combine_ovf you can simply use > > combined_cst = wi::add (w1, w2); > > + /* Convert combined constant to tree of outer type if > + there was no value range split in the original operation. */ > + if (!range_split) > + { > > I'd say you want to condition on range_split early, like with > > bool range_split; > if (TYPE_OVERFLOW_UNDEFINED (inner_type) > || ! (extract_range_from_binary_expr (...., &range_split), > range_split)) > { > ... > } > > and avoid all the work if you throw it away anyway. > >> Can UINT_MAX, -1 and >> similar cases be disambiguated (and correctly converted to the outer >> type) when done in unsigned arithmetic? > > see above how I expect it to "just work". > >> >> Thinking about the second pattern, on s390x it introduces more casts >> than just using the first one (e.g. in cases where the value will be >> sign-extended after the operation which wouldn't have happened when >> performing the operation in the larger type. Can we catch this >> with a cost function? > > Not sure, if it's really too bad we can try merging the two patterns again, > thus have (T)(A +- CST) +- CST -> (T)(A +- CST) again. Now that both > patterns look simple enough that should be possible without too much hassle. > > + (if (cst) > + (outer_op (convert { @0; }) { cst; })) > > you can write this as > > (if (cst) > (outer_op (convert @0) { cst})) > > no need for the {}s around @0. > > + /* ((T)(A)) +- CST -> (T)(A +- CST) */ > +#if GIMPLE > + (for outer_op (plus minus) > + (simplify > + (outer_op (convert @0) INTEGER_CST@2) > + (if (TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > + && TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > + && TREE_CODE (type) == INTEGER_TYPE) > + /* Perform binary operation inside the cast if the constant fits > + and there is no overflow. */ > + (with > + { > + tree cst_inner; > + bool range_split = true; > + > + wide_int cst = @2; > + cst_inner = wide_int_to_tree (TREE_TYPE (@0), cst); > > not sure if that really does what you want (you're truncating the constant > to the smaller type). I think you want to check > > int_fits_type_p (TREE_TYPE (@0), @2) > > and then simply use > > tree cst_inner = fold_convert (TREE_TYPE (@0), @2); > > as you do not allow a simple sign-change here if you merge the patterns > disallowing it in the above one would simplify things as well. > > + value_range vr = VR_INITIALIZER; > + extract_range_from_binary_expr (&vr, outer_op, TREE_TYPE (@0), > + @0, cst_inner, &range_split); > > >> >> On a side note: Can/should VRP infer ranges assuming no undefined >> behavior will take place when -fstrict-overflow is in use? I.e. >> inferring ~[INT_MIN,INT_MIN] for (long)(a - 1)? Would this even make sense? > > It could do that but it does not at the moment. > > Richard. > >> Regards >> Robin