Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

Richard Biener Wed, 30 Aug 2017 05:47:22 -0700

On Wed, Aug 30, 2017 at 11:46 AM, Jackson Woodruff
<jackson.woodr...@foss.arm.com> wrote:
> On 08/29/2017 01:13 PM, Richard Biener wrote:
>>
>> On Tue, Aug 29, 2017 at 1:35 PM, Jackson Woodruff
>> <jackson.woodr...@foss.arm.com> wrote:
>>>
>>> Hi all,
>>>
>>> Apologies again to those CC'ed, who (again) received this twice.
>>>
>>> Joseph: Yes you are correct. I misread the original thread, now fixed.
>>>
>>> Richard: I've moved the optimizations out of fold-const.c. One has been
>>> replicated in match.pd, and the other (x / C +- y / C -> (x +- y) / C)
>>> I've
>>> deleted as it only introduced a new optimization when running with the
>>> flags
>>> '-O0 -funsafe-math-optimizations'.
>>
>>
>> Hmm, how did you verify that, that it only adds sth with -O0
>> -funsafe-math-optimizations?
>
>
> By checking with various flags, although not exhaustively. I looked for
> reasons for the behavior in match.pd (explained below).
>
> I have also since discovered that the combinations of
> '-funsafe-math-optimizations -frounding-math' (at all levels) and
> '-fno-recriprocal-math -funsafe-math-optimizations' mean this pattern adds
> something.
>
>> Is that because in GIMPLE the reassoc pass should do this transform?
>
> It's because the pattern that changes (X / C) -> X * (1 / C) is gated with
> O1:
>
>   (for cst (REAL_CST COMPLEX_CST VECTOR_CST)
>    (simplify
>     (rdiv @0 cst@1)
> ->    (if (optimize)
> ->     (if (flag_reciprocal_math
>       && !real_zerop (@1))
>       (with
>        { tree tem = const_binop (RDIV_EXPR, type, build_one_cst (type), @1);
> }
>        (if (tem)
>         (mult @0 { tem; } )))
>       (if (cst != COMPLEX_CST)
>        (with { tree inverse = exact_inverse (type, @1); }
>         (if (inverse)
>          (mult @0 { inverse; } ))))))))
>
>
> I've flagged the two lines that are particularly relevant to this.


So this means we go x / (C * y) -> (x / C) / y -> (x * (1/C)) / y
why's that in any way preferable?  I suppose this is again to enable
the recip pass to detect / y (as opposed to / (C * y))?  What's the
reason to believe that / y is more "frequent"?

> Removing this pattern, as I would expect, means that the divisions in the
> above optimization (and the one further down) are not removed.
>
> So then there is the question of edge cases. This pattern is (ignoring the
> second case) going to fail when const_binop returns null. Looking through
> that function says that it will fail (for reals) when:
>
> - Either argument is null (not the case)
>
> - The operation is not in the list (PLUS_EXPR,
>   MINUS_EXPR, MULT_EXPR, RDIV_EXPR, MIN_EXPR, MAX_EXPR)
>   (again not the case)
>
> - We honor Signalling NaNs and one of the operands is a sNaN.
>
> - The operation is a division, and the second argument is zero
>   and dividing by 0.0 raises an exception.
>
> - The result is infinity and neither of the operands were infinity
>   and flag_trapping_math is set.
>
> - The result isn't exact and flag_rounding_math is set.
>
>
> For (x / ( y * C) -> (x / C) / y), I will add some gating for each of these
> so that the pattern is never executed if one of these would be the case.

Why not transform this directly to (x * (1/C)) / y then (and only then)?
That makes it obvious not two divisions prevail.

That said, I'm questioning this canonicalization.  I can come up with a
testcase where it makes things worse:

 tem = x / (y * C);
 tem2 = z / (y * C);

should generate

 rdivtmp = 1 / (y*C);
 tem = x *rdivtmp;
 tem2= z * rdivtmp;

instead of

 rdivtmp = 1/y;
 tem = x * 1/C * rdivtmp;
 tem2 = z * 1/C * rdivtmp;

> The additional cases where this isn't converted to a multiplication by the
> reciprocal appear to be when -freciprocal-math is used, but we don't have
> -fno-rounding-math, or funsafe-math-optimizations.
>
> For the removal of (x / C +- y / C -> (x +- y) / C), there is a trade-off of
> whether it is worth having an extra pattern for these edge cases. On this
> I'm not sure.

Well, at least leave it in fold-const.c if you can't move it.

>>
>> You added
>>
>> +/* Simplify x / (- y) to -x / y.  */
>> +(simplify
>> + (rdiv @0 (negate @1))
>> + (rdiv (negate @0) @1))
>
>
> Sorry, this was unclear, the changes to fold const should be split up from
> the additions to match.pd.
>
> This was one of the changes discussed before. The idea is to canonicalize
> this so that y can be extracted in the cse_reciprocals pass.

As said elsewhere I think cse_reciprocals needs a rewrite to not rely on
arbitrary canonicalization to do its work but consider association
opportunities globally with a focus on CSE.

>
>>
>> for
>>
>>        /* (-A) / (-B) -> A / B  */
>>        if (TREE_CODE (arg0) == NEGATE_EXPR && negate_expr_p (arg1))
>>          return fold_build2_loc (loc, RDIV_EXPR, type,
>>                              TREE_OPERAND (arg0, 0),
>>                              negate_expr (arg1));
>>        if (TREE_CODE (arg1) == NEGATE_EXPR && negate_expr_p (arg0))
>>          return fold_build2_loc (loc, RDIV_EXPR, type,
>>                              negate_expr (arg0),
>>                              TREE_OPERAND (arg1, 0));
>>
>> presumably?  This isn't equivalent.  It's more like
>>
>> (simplify
>>    (rdiv (negate_expr_p @0) (negate @1))
>>    (rdiv (negate @0) @1))
>> (simplify
>>    (rdiv (negate @0) (negate_expr_p @1))
>>    (rdiv @0 (negate @1)))
>>
>> and you should remove the corresponding fold-const.c code.
>>
>> Please do these changes independently to aid bisecting in case of issues.
>
>
> I'll resubmit two different patches when I can get them separated. It should
> also make it easier to review when it is clearer what belongs where.

Thanks.

>>
>>   (if (flag_reciprocal_math)
>> - /* Convert (A/B)/C to A/(B*C)  */
>> + /* Convert (A/B)/C to A/(B*C) where neither B nor C are constant.  */
>>    (simplify
>>     (rdiv (rdiv:s @0 @1) @2)
>> -   (rdiv @0 (mult @1 @2)))
>> +   (if (TREE_CODE (@1) != REAL_CST && TREE_CODE (@1) != REAL_CST)
>> +    (rdiv @0 (mult @1 @2))))
>>
>> why?  I guess to avoid ping-poning with
>
>
> Yes, although there is a mistake there. It should be:
>
> (if (TREE_CODE (@1) != REAL_CST && TREE_CODE (@2) != REAL_CST)
>
> I'll fix that up.
>
>>
>> + /* Simplify x / (C * y) to (x / C) / y where C is a constant.  */
>> + (simplify
>> +  (rdiv @0
>> +   (mult @1 REAL_CST@2))
>> +  (rdiv (rdiv @0 @2) @1))
>>
>> ?  If so why not just disallow for @1 == REAL_CST?
>
>
> The ping-ponging issue is there even when only one of the operands is a
> constant (which of course it has to be for this pattern to apply). For
> example, something like x / (y * 3), with '-O0 -ffast-math',
> when the reciprocal isn't computed, ping pongs back and forth.
>
> I'm not sure that it can be fixed without a condition on the pattern already
> there.
>
>
>>
>>> On O1 and up, the pattern that replaces 'x / C' with 'x * (1 / C)'
>>> is enabled and then the pattern is covered by that and
>>> (x * C +- y * C -> C * (x +- y)) (which is already present in match.pd)
>>>
>>> I have also updated the testcase for those optimizations to use 'O1' to
>>> avoid that case.
>>>
>>>
>>> On 08/24/2017 10:06 PM, Jeff Law wrote:
>>>>
>>>>
>>>> On 08/17/2017 03:55 AM, Wilco Dijkstra wrote:
>>>>>
>>>>>
>>>>> Richard Biener wrote:
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 15, 2017 at 4:11 PM, Wilco Dijkstra
>>>>>> <wilco.dijks...@arm.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Richard Biener wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We also change the association of
>>>>>>>>>
>>>>>>>>>         x / (y * C) -> (x / C) / y
>>>>>>>>>
>>>>>>>>> If C is a constant.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Why's that profitable?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It enables (x * C1) / (y * C2) -> (x * C1/C2) / y for example.
>>>>>>> Also 1/y is now available to the reciprocal optimization, see
>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 for details.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sure, but on its own it's going to be slower.  So this isn't the
>>>>>> correct way to enable those followup transforms.
>>>>>
>>>>>
>>>>>
>>>>> How can it be any slower? It's one division and one multiply in both
>>>>> cases.
>>>>
>>>>
>>>> x / (y * C) -> (x / C) / y
>>>>
>>>> Goes from one division and one multiplication to two divisions.  I'm
>>>> guessing that's what Richi is (reasonably) concerned about.
>>>>
>>>> So it may be the case that we need more complex pattern matching here
>>>> for when to perform the first transformation to ultimately enable the
>>>> second.
>>>>
>>>
>>> The only case where we don't remove the division but still execute this
>>> pattern is when run with'-O0 -freciprocal-math'.
>>>
>>> As long as we have 'O1' or greater (and -freciprocal-math), that is
>>> enough
>>> to enable the removal of the reciprocal.
>>
>>
>> I don't see this.  I presume you mean this happens in the recip pass?
>
>
> It's one of the match.pd patterns that actually does the extraction since C
> is always constant. I've copied in the pattern above in response to one of
> your previous comments.
>
>> But we don't optimize this when optimizing for size (but the above pattern
>> > still applies) or when targetm.min_divisions_for_recip_mul is too large
>
>
> It was my understanding that this is used to specify the number of divisions
> you have to be able to replace with a multiplication before it is worthwhile
> inserting an extra multiplication.
>
> So for situations like this, I think that this is not quite what is being
> measured, because there isn't an extra multiplication being inserted?
>
>
>>
>> So I still think this is the wrong place to do this and instead the recip
>> pass should be extended.
>>
>>
>>>
>>> Jackson.
>>>
>>>>
>>>> Jeff
>>>>
>>>
>

Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

Reply via email to