Re: Extend tree-call-cdce to calls whose result is used

Richard Sandiford Fri, 13 Nov 2015 05:13:06 -0800

Richard Biener <richard.guent...@gmail.com> writes:
> On Mon, Nov 9, 2015 at 10:03 PM, Michael Matz <m...@suse.de> wrote:
>> Hi,
>>
>> On Mon, 9 Nov 2015, Richard Sandiford wrote:
>>
>>> +static bool
>>> +can_use_internal_fn (gcall *call)
>>> +{
>>> +  /* Only replace calls that set errno.  */
>>> +  if (!gimple_vdef (call))
>>> +    return false;
>>
>> Oh, I managed to confuse this in my head while reading the patch.  So,
>> hmm, you don't actually replace the builtin with an internal function
>> (without the condition) under no-errno-math?  Does something else do that?
>> Because otherwise that seems an unnecessary restriction?
>>
>>> >> r229916 fixed that for the non-EH case.
>>> >
>>> > Ah, missed it.  Even the EH case shouldn't be difficult.  If the
>>> > original dominator of the EH destination was the call block it moves,
>>> > otherwise it remains unchanged.
>>>
>>> The target of the edge is easy in itself, I agree, but that isn't
>>> necessarily the only affected block, if the EH handler doesn't
>>> exit or rethrow.
>>
>> You're worried the non-EH and the EH regions merge again, right?  Like so:
>>
>> before change:
>>
>>         BB1: throwing-call
>>      fallthru/           \EH
>>     BB2                   BBeh
>>      |                   /....\ (stuff in EH-region)
>>      |                     /    some path out of EH region
>>      | /------------------/
>>     BB3
>>
>> Here, BB3 must at least be dominated by BB1 (the throwing block), or by
>> something further up (when there are other side-entries to the path
>> BB2->BB3 or into the EH region).  When further up, nothing changes, when
>> it's BB1, then it's afterwards dominated by the BB containing the
>> condition.  So everything with idom==BB1 gets idom=Bcond, except for BBeh,
>> which gets idom=Bcall.  Depending on how you split BB1, either Bcond or
>> BBcall might still be BB1 and doesn't lead to changes in the dom tree.
>>
>>> > Currently we have quite some of such passes (reassoc, forwprop,
>>> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap
>>> > and others), but they are all handling only special situations in one
>>> > way or the other.  pass_fold_builtins is another one, but it seems
>>> > most related to what you want (replacing a call with something else),
>>> > so I thought that'd be the natural choice.
>>>
>>> Well, to be pedantic, it's not really replacing the call.  Except for
>>> the special case of targets that support direct assignments to errno,
>>> it keeps the original call but ensures that it isn't usually executed.
>>> From that point of view it doesn't really seem like a fold.
>>>
>>> But I suppose that's just naming again :-).  And it's easily solved with
>>> s/fold/rewrite/.
>>
>> Exactly, in my mind pass_fold_builtin (like many of the others I
>> mentioned) doesn't do folding but rewriting :)
>
> So I am replying here to the issue of where to do the transform call_cdce
> does and the one Richard wants to add.  For example we "lower"
> posix_memalign as early as GIMPLE lowering (that's before CFG construction).
> We also lower sincos to cexpi during GENERIC folding (or if that is dropped
> either GIMPLE lowering or GIMPLE folding during gimplification would be
> appropriate).
>
> Now, with offloading we have to avoid creating target dependencies before
> LTO stream-out (thus no IFN replacements before that - not sure if
> Richards patches have an issue there already).


No, this patch was the earliest point at which we converted to internal
functions.  The idea was to make code treat ECF_PURE built-in functions
and internal functions as being basically equivalent.  There's therefore
not much benefit to doing a straight replacement of one with the other
during either GENERIC or gimple.  Instead the series only used internal
functions for things that built-in functions couldn't do, specifically:

- the case used in this patch, to optimise part of a non-pure built-in
  function using a pure equivalent.

- vector versions of built-in functions.

The cfgexpand patch makes sure that pure built-in functions are expanded
like internal functions where possible.

> Which would leave us with a lowering stage early in the main
> optimization pipeline - I think fold_builtins pass is way too late but
> any "folding" pass will do (like forwprop or backprop where the latter
> might be better because it might end up computing FP "ranges" to
> improve the initial lowering code).

This isn't at all related to what backprop is doing though.
backprop is about optimising definitions based on information
about all uses.

Does fold_builtins need to be as late as it is?

> Of course call_cdce is as good as long as it still exists.

Does this meann that you're not against the patch in principle
(i.e. keeping call_cdce for now and extending it in the way that
this patch does)?

Thanks,
Richard

Re: Extend tree-call-cdce to calls whose result is used

Reply via email to