On Thu, Mar 8, 2012 at 3:29 PM, Ulrich Weigand <uweig...@de.ibm.com> wrote: > Richard Guenther wrote: >> On Tue, Feb 28, 2012 at 4:10 PM, Ulrich Weigand <uweig...@de.ibm.com> wrote: >> > I'll still need to do proper testing and benchmarking, but I thought >> > I'd post the patch anyway just as a heads-up ... >> > >> > Does this look reasonable to you? >> >> Yes, that looks reasonable. Though instead of unsigned_type_for >> please use build_nonstandard_integer_type instead (if the type >> is not already unsigned). unsigned_type_for does not necessarily >> preserve type-precision and may even return NULL. > > OK, I've changed the patch to use build_nonstandard_integer_type, and it > still fixes the original problem. However, on further examination it > turns out that this "fix" is not because the expression is actually > simplified at tree level. Rather, it is just that because of some > minor reassociation (the +1 is moved to the end), the *RTL* optimizers > now see a (just as complex but) slightly different RTL sequence, and > therefore combine now just happens to be able to fold everything > together (using the RTL-level simplify_plus_minus machinery). > > Even worse, the patch causes a number of regressions: > >> FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "\\+" 0 >> FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "n_. \\* n_." >> 1 >> FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect >> "OUTER LOOP VECTORIZED." 1 > > > The loop-15.c problem is probably just a missed optimization > elsewhere that can be fixed. The problem is that a loop-end > value used to be computed as "(n - 1) * n + n" which got > simplified via fold_plusminus_mult_expr to "n * n". When > computing in unsigned, however, the expression becomes something > along the lines of > > "[...] * (unsigned) n + (unsigned) n" > > and the top-level of fold_binary_loc removes the *outermost* NOP_EXPRs > resulting in > > "[...] * (unsigned) n + n" > > which fold_plusminus_mult_expr no longer recognizes as something > it can handle (since "(unsigned) n" is not operand_equal_p to "n"). > > I think this can be fixed by having fold_plusminus_mult_expr call > STRIP_NOPS on sub-operands, as is already done elsewhere in fold_binary_loc. > > This basically fixes the test case (due to some remaining casts, the > scan-tree-dump patterns still don't match as-is, but the expected code > is again generated). > > > The no-scevccp-noreassoc-outer-4.c problem is more difficult, and would > seem to point to a fundamental problem with performing the computation > in unsigned mode. In this case, the end value of an inner loop forms > part of a scalar evolution of the outer loop. When computing the end > value in unsigned type, that scalar evolution is no longer detected. > In part, this is because the scalar evolution machinery might need to > be improved in handling casts. In particular, in some places it only > does STRIP_USELESS_TYPE_CONVERSION; changing those to STRIP_NOPS makes > it detect the evolution again. But I'm not fully convinced this is > a safe change ... In any case, this still doesn't make the outer > loop vectorizable, since it is no longer provably finite. Without > the patch, this could be proven *because* the computation of the > inner loop's end value used signed arithmetic which cannot overflow. > By performing the computation in unsigned, this information is in > fact lost. > > > This would seem to indicate that unconditionally using unsigned > arithmetic even if not strictly necessary might not always be > the best solution either. I've gone back and looked that at the > original problem that > > (start + 1) + (int)((unsigned int) ~start + (unsigned int) end) > > is not simplified by fold-const. > > Interestingly enough, it *is* valid to simplify this expression > to just plain "end", even with signed arithmetic, since the > transformation does not introduce any new potential overflows. > > This is actually recognized in fold_binary_loc, but only one special > case. See the code around the following comment: > /* The only case we can still associate with two variables > is if they are the same, modulo negation. */ > > Unfortunately, this handles only A and -A; it doesn't recognize that > A+1 is in fact the negation of ~A ... > > There is also code in tree-ssa-forwprop.c:associate_plusminus that > handles quite a number of special cases where re-association *is* > allowed even for signed arithmetic: > > (A +- B) - A -> +- B > (A +- B) -+ B -> A > (CST +- A) +- CST -> CST +- A > (A + CST) +- CST -> A + CST > ~A + A -> -1 > ~A + 1 -> -A > A - (A +- B) -> -+ B > A +- (B +- A) -> +- B > CST +- (CST +- A) -> CST +- A > CST +- (A +- CST) -> CST +- A > A + ~A -> -1 > > This handles some cases involving ~, but still not quite the case > we need for this expression. In addition, the forward propagtion logic > doesn't seem to handle casts very well (as opposed to fold-const, which > makes liberal use of STRIP_NOPS). > > > I'm wondering where to go from there: > > - Why are those special re-association cases handled only in forwprop, > and not in fold-cost? I would have expected forwprop to just propagate > subexpressions into a tree and then use fold-const to actually simplify ...
We are moving away from GENERIC expression simplification towards simplifications/combines on GIMPLE SSA (Andrew Pinski is working on this a lot on his branch at the moment). > - Should I try to improve forwprop to handle casts and additional re- > association cases until it handles the above expression? Yes, ideally by trying to sub-divide this task into separate profitable transforms. Maybe Andrew can chime in here? > - Or should the logic in fold-const be improved to add additional cases > of re-association? I'd say no. > Thanks for any further suggestions! > > > I've attached some of the changes mentioned above for reference. > > Bye, > Ulrich > > > Patch: Compute loop-end value using unsigned type. > > Index: gcc/tree-scalar-evolution.c > =================================================================== > --- gcc/tree-scalar-evolution.c (revision 184625) > +++ gcc/tree-scalar-evolution.c (working copy) > @@ -474,11 +474,22 @@ > return chrec_dont_know; > else > { > + tree type = TREE_TYPE (evolution_fn), utype = type; > tree res; > > + /* Since the number of iterations is always unsigned, we perform > + the computation in an unsigned type (if feasible) to allow > for > + improved simplification of subexpressions. */ > + if ((INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)) > + || POINTER_TYPE_P (type)) > + utype = build_nonstandard_integer_type > + (TYPE_PRECISION (type), 1); > + > /* evolution_fn is the evolution function in LOOP. Get > its value in the nb_iter-th iteration. */ > - res = chrec_apply (inner_loop->num, evolution_fn, nb_iter); > + res = chrec_convert (utype, evolution_fn, NULL); > + res = chrec_apply (inner_loop->num, res, nb_iter); > + res = chrec_convert (type, res, NULL); > > if (chrec_contains_symbols_defined_in_loop (res, loop->num)) > res = instantiate_parameters (loop, res); > > > > Patch: Use STRIP_NOPS on subexpressions in fold_plusminus_mult_expr > > Index: gcc/fold-const.c > =================================================================== > --- gcc/fold-const.c (revision 184625) > +++ gcc/fold-const.c (working copy) > @@ -7081,6 +7081,8 @@ > { > arg00 = TREE_OPERAND (arg0, 0); > arg01 = TREE_OPERAND (arg0, 1); > + STRIP_NOPS (arg00); > + STRIP_NOPS (arg01); > } > else if (TREE_CODE (arg0) == INTEGER_CST) > { > @@ -7099,6 +7101,8 @@ > { > arg10 = TREE_OPERAND (arg1, 0); > arg11 = TREE_OPERAND (arg1, 1); > + STRIP_NOPS (arg10); > + STRIP_NOPS (arg11); > } > else if (TREE_CODE (arg1) == INTEGER_CST) > { > > > > Patch: Use STRIP_NOPS when following scalar evolutions > > Index: gcc/tree-scalar-evolution.c > =================================================================== > --- gcc/tree-scalar-evolution.c (revision 184625) > +++ gcc/tree-scalar-evolution.c (working copy) > @@ -1154,8 +1165,8 @@ > rhs0 = TREE_OPERAND (expr, 0); > rhs1 = TREE_OPERAND (expr, 1); > type = TREE_TYPE (rhs0); > - STRIP_USELESS_TYPE_CONVERSION (rhs0); > - STRIP_USELESS_TYPE_CONVERSION (rhs1); > + STRIP_NOPS (rhs0); > + STRIP_NOPS (rhs1); > res = follow_ssa_edge_binary (loop, at_stmt, type, rhs0, code, rhs1, > halting_phi, evolution_of_loop, limit); > break; > @@ -1168,8 +1179,8 @@ > rhs0 = TREE_OPERAND (expr, 0); > rhs1 = TREE_OPERAND (expr, 1); > type = TREE_TYPE (rhs0); > - STRIP_USELESS_TYPE_CONVERSION (rhs0); > - STRIP_USELESS_TYPE_CONVERSION (rhs1); > + STRIP_NOPS (rhs0); > + STRIP_NOPS (rhs1); > res = follow_ssa_edge_binary (loop, at_stmt, type, > rhs0, POINTER_PLUS_EXPR, rhs1, > halting_phi, evolution_of_loop, limit); > > > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > ulrich.weig...@de.ibm.com >