On Wed, Aug 30, 2017 at 11:46 AM, Jackson Woodruff <jackson.woodr...@foss.arm.com> wrote: > On 08/29/2017 01:13 PM, Richard Biener wrote: >> >> On Tue, Aug 29, 2017 at 1:35 PM, Jackson Woodruff >> <jackson.woodr...@foss.arm.com> wrote: >>> >>> Hi all, >>> >>> Apologies again to those CC'ed, who (again) received this twice. >>> >>> Joseph: Yes you are correct. I misread the original thread, now fixed. >>> >>> Richard: I've moved the optimizations out of fold-const.c. One has been >>> replicated in match.pd, and the other (x / C +- y / C -> (x +- y) / C) >>> I've >>> deleted as it only introduced a new optimization when running with the >>> flags >>> '-O0 -funsafe-math-optimizations'. >> >> >> Hmm, how did you verify that, that it only adds sth with -O0 >> -funsafe-math-optimizations? > > > By checking with various flags, although not exhaustively. I looked for > reasons for the behavior in match.pd (explained below). > > I have also since discovered that the combinations of > '-funsafe-math-optimizations -frounding-math' (at all levels) and > '-fno-recriprocal-math -funsafe-math-optimizations' mean this pattern adds > something. > >> Is that because in GIMPLE the reassoc pass should do this transform? > > It's because the pattern that changes (X / C) -> X * (1 / C) is gated with > O1: > > (for cst (REAL_CST COMPLEX_CST VECTOR_CST) > (simplify > (rdiv @0 cst@1) > -> (if (optimize) > -> (if (flag_reciprocal_math > && !real_zerop (@1)) > (with > { tree tem = const_binop (RDIV_EXPR, type, build_one_cst (type), @1); > } > (if (tem) > (mult @0 { tem; } ))) > (if (cst != COMPLEX_CST) > (with { tree inverse = exact_inverse (type, @1); } > (if (inverse) > (mult @0 { inverse; } )))))))) > > > I've flagged the two lines that are particularly relevant to this.
So this means we go x / (C * y) -> (x / C) / y -> (x * (1/C)) / y why's that in any way preferable? I suppose this is again to enable the recip pass to detect / y (as opposed to / (C * y))? What's the reason to believe that / y is more "frequent"? > Removing this pattern, as I would expect, means that the divisions in the > above optimization (and the one further down) are not removed. > > So then there is the question of edge cases. This pattern is (ignoring the > second case) going to fail when const_binop returns null. Looking through > that function says that it will fail (for reals) when: > > - Either argument is null (not the case) > > - The operation is not in the list (PLUS_EXPR, > MINUS_EXPR, MULT_EXPR, RDIV_EXPR, MIN_EXPR, MAX_EXPR) > (again not the case) > > - We honor Signalling NaNs and one of the operands is a sNaN. > > - The operation is a division, and the second argument is zero > and dividing by 0.0 raises an exception. > > - The result is infinity and neither of the operands were infinity > and flag_trapping_math is set. > > - The result isn't exact and flag_rounding_math is set. > > > For (x / ( y * C) -> (x / C) / y), I will add some gating for each of these > so that the pattern is never executed if one of these would be the case. Why not transform this directly to (x * (1/C)) / y then (and only then)? That makes it obvious not two divisions prevail. That said, I'm questioning this canonicalization. I can come up with a testcase where it makes things worse: tem = x / (y * C); tem2 = z / (y * C); should generate rdivtmp = 1 / (y*C); tem = x *rdivtmp; tem2= z * rdivtmp; instead of rdivtmp = 1/y; tem = x * 1/C * rdivtmp; tem2 = z * 1/C * rdivtmp; > The additional cases where this isn't converted to a multiplication by the > reciprocal appear to be when -freciprocal-math is used, but we don't have > -fno-rounding-math, or funsafe-math-optimizations. > > For the removal of (x / C +- y / C -> (x +- y) / C), there is a trade-off of > whether it is worth having an extra pattern for these edge cases. On this > I'm not sure. Well, at least leave it in fold-const.c if you can't move it. >> >> You added >> >> +/* Simplify x / (- y) to -x / y. */ >> +(simplify >> + (rdiv @0 (negate @1)) >> + (rdiv (negate @0) @1)) > > > Sorry, this was unclear, the changes to fold const should be split up from > the additions to match.pd. > > This was one of the changes discussed before. The idea is to canonicalize > this so that y can be extracted in the cse_reciprocals pass. As said elsewhere I think cse_reciprocals needs a rewrite to not rely on arbitrary canonicalization to do its work but consider association opportunities globally with a focus on CSE. > >> >> for >> >> /* (-A) / (-B) -> A / B */ >> if (TREE_CODE (arg0) == NEGATE_EXPR && negate_expr_p (arg1)) >> return fold_build2_loc (loc, RDIV_EXPR, type, >> TREE_OPERAND (arg0, 0), >> negate_expr (arg1)); >> if (TREE_CODE (arg1) == NEGATE_EXPR && negate_expr_p (arg0)) >> return fold_build2_loc (loc, RDIV_EXPR, type, >> negate_expr (arg0), >> TREE_OPERAND (arg1, 0)); >> >> presumably? This isn't equivalent. It's more like >> >> (simplify >> (rdiv (negate_expr_p @0) (negate @1)) >> (rdiv (negate @0) @1)) >> (simplify >> (rdiv (negate @0) (negate_expr_p @1)) >> (rdiv @0 (negate @1))) >> >> and you should remove the corresponding fold-const.c code. >> >> Please do these changes independently to aid bisecting in case of issues. > > > I'll resubmit two different patches when I can get them separated. It should > also make it easier to review when it is clearer what belongs where. Thanks. >> >> (if (flag_reciprocal_math) >> - /* Convert (A/B)/C to A/(B*C) */ >> + /* Convert (A/B)/C to A/(B*C) where neither B nor C are constant. */ >> (simplify >> (rdiv (rdiv:s @0 @1) @2) >> - (rdiv @0 (mult @1 @2))) >> + (if (TREE_CODE (@1) != REAL_CST && TREE_CODE (@1) != REAL_CST) >> + (rdiv @0 (mult @1 @2)))) >> >> why? I guess to avoid ping-poning with > > > Yes, although there is a mistake there. It should be: > > (if (TREE_CODE (@1) != REAL_CST && TREE_CODE (@2) != REAL_CST) > > I'll fix that up. > >> >> + /* Simplify x / (C * y) to (x / C) / y where C is a constant. */ >> + (simplify >> + (rdiv @0 >> + (mult @1 REAL_CST@2)) >> + (rdiv (rdiv @0 @2) @1)) >> >> ? If so why not just disallow for @1 == REAL_CST? > > > The ping-ponging issue is there even when only one of the operands is a > constant (which of course it has to be for this pattern to apply). For > example, something like x / (y * 3), with '-O0 -ffast-math', > when the reciprocal isn't computed, ping pongs back and forth. > > I'm not sure that it can be fixed without a condition on the pattern already > there. > > >> >>> On O1 and up, the pattern that replaces 'x / C' with 'x * (1 / C)' >>> is enabled and then the pattern is covered by that and >>> (x * C +- y * C -> C * (x +- y)) (which is already present in match.pd) >>> >>> I have also updated the testcase for those optimizations to use 'O1' to >>> avoid that case. >>> >>> >>> On 08/24/2017 10:06 PM, Jeff Law wrote: >>>> >>>> >>>> On 08/17/2017 03:55 AM, Wilco Dijkstra wrote: >>>>> >>>>> >>>>> Richard Biener wrote: >>>>>> >>>>>> >>>>>> On Tue, Aug 15, 2017 at 4:11 PM, Wilco Dijkstra >>>>>> <wilco.dijks...@arm.com> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Richard Biener wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> We also change the association of >>>>>>>>> >>>>>>>>> x / (y * C) -> (x / C) / y >>>>>>>>> >>>>>>>>> If C is a constant. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Why's that profitable? >>>>>>> >>>>>>> >>>>>>> >>>>>>> It enables (x * C1) / (y * C2) -> (x * C1/C2) / y for example. >>>>>>> Also 1/y is now available to the reciprocal optimization, see >>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 for details. >>>>>> >>>>>> >>>>>> >>>>>> Sure, but on its own it's going to be slower. So this isn't the >>>>>> correct way to enable those followup transforms. >>>>> >>>>> >>>>> >>>>> How can it be any slower? It's one division and one multiply in both >>>>> cases. >>>> >>>> >>>> x / (y * C) -> (x / C) / y >>>> >>>> Goes from one division and one multiplication to two divisions. I'm >>>> guessing that's what Richi is (reasonably) concerned about. >>>> >>>> So it may be the case that we need more complex pattern matching here >>>> for when to perform the first transformation to ultimately enable the >>>> second. >>>> >>> >>> The only case where we don't remove the division but still execute this >>> pattern is when run with'-O0 -freciprocal-math'. >>> >>> As long as we have 'O1' or greater (and -freciprocal-math), that is >>> enough >>> to enable the removal of the reciprocal. >> >> >> I don't see this. I presume you mean this happens in the recip pass? > > > It's one of the match.pd patterns that actually does the extraction since C > is always constant. I've copied in the pattern above in response to one of > your previous comments. > >> But we don't optimize this when optimizing for size (but the above pattern >> > still applies) or when targetm.min_divisions_for_recip_mul is too large > > > It was my understanding that this is used to specify the number of divisions > you have to be able to replace with a multiplication before it is worthwhile > inserting an extra multiplication. > > So for situations like this, I think that this is not quite what is being > measured, because there isn't an extra multiplication being inserted? > > >> >> So I still think this is the wrong place to do this and instead the recip >> pass should be extended. >> >> >>> >>> Jackson. >>> >>>> >>>> Jeff >>>> >>> >