Re: [google] Patch to support calling multi-versioned functions via new GCC builtin. (issue4440078)

Richard Guenther Tue, 03 May 2011 03:00:33 -0700

On Tue, May 3, 2011 at 1:07 AM, Xinliang David Li <[email protected]> wrote:
> On Mon, May 2, 2011 at 2:33 PM, Richard Guenther
> <[email protected]> wrote:
>> On Mon, May 2, 2011 at 6:41 PM, Xinliang David Li <[email protected]> wrote:
>>> On Mon, May 2, 2011 at 2:11 AM, Richard Guenther
>>> <[email protected]> wrote:
>>>> On Fri, Apr 29, 2011 at 6:23 PM, Xinliang David Li <[email protected]> 
>>>> wrote:
>>>>> Here is the background for this feature:
>>>>>
>>>>> 1) People relies on function multi-version to explore hw features and
>>>>> squeeze performance, but there is no standard ways of doing so, either
>>>>> a) using indirect function calls with function pointers set at program
>>>>> initialization; b) using manual dispatch at each callsite; b) using
>>>>> features like IFUNC.  The dispatch mechanism needs to be promoted to
>>>>> the language level and becomes the first class citizen;
>>>>
>>>> You are not doing that, you are inventing a new (crude) GCC extension.
>>>
>>> To capture the high level semantics and prevent user from lowering the
>>> dispatch calls into forms compiler can not recognize, language
>>> extension is the way to go.
>>
>> I don't think so.  With your patch only two passes understand the new
>> high-level form, the rest of the gimple passes are just confused.
>
> There is no need for other passes to understand it -- just treat it as
> opaque calls. This is goodness otherwise other passes need to be
> modified. This is true (only some passes understand it) for things
> like __builtin_expect.


Certainly __builtin_dispatch has to be understood by alias analysis and
all other passes that care about calls (like all IPA passes).  You can
of course treat it conservatively (may call any function, even those
which have their address not taken, clobber and read all memory, even
that which doesn't escape the TU).

Why obfuscate things when it is not necessary?

>>>
>>> 1) the desired optimization may not happen subject to many compiler
>>> heuristic changes;
>>> 2) it has other side effects such as wrong estimation of function size
>>> which may impact inlining
>>
>> May, may ... so you say all this can't happen under any circumstance
>> with your special code and passes?
>
> No that is not my argument. What I tried to say is it will be harder
> to achieve without high level semantics -- it requires more
> handshaking between compiler passes.

Sure - that's life.

>> Which nobody will see benefit
>> from unless they rewrite their code?
>
> The target users for the builtin include compiler itself -- it can
> synthesize dispatch calls.

Hum.  I'm not at all sure the dispatch calls are the best representation
for the IL.

>> Well, I say if we can improve
>> _some_ of the existing usages that's better than never doing wrong
>> on a new language extension.
>
> This is independent.

It is not.

>> One that I'm not convinced is the way
>> to go (you didn't address at all the inability to use float arguments
>> and the ABI issues with using variadic arguments - after all you
>> did a poor-mans language extension by using GCC builtins instead
>> of inventing a true one).
>
> This is an independent issue that either needs to be addressed or
> marked as limitation. The key of the debate is whether source/IR
> annotation using construct with high level semantics helps optimizer.
> In fact this is common. Would it make any difference (in terms of
> acceptance) if the builtin is only used internally by the compiler and
> not exposed to the user?

No.  I don't see at all why having everything in a single stmt is so much
more convenient.  And I don't see why existing IL features cannot be
used to make things a little more convenient.

>>> 3) it limits the lowering into one form which may not be ideal  --
>>> with builtin_dispatch, after hoisting optimization, the lowering can
>>> use more efficient IFUNC scheme, for instance.
>>
>> I see no reason why we cannot transform a switch-indirect-call
>> pattern into an IFUNC call.
>>
>
> It is possible -- but it is like asking user to lower the dispatch and
> tell compiler to raise it again ..

There is no possibility for a high-level dispatch at the source level.
And if I'd have to design one I would use function overloading, like

float compute_sth (float) __attribute__((version("sse4")))
{
  ... sse4 code ...
}

float compute_sth (float)
{
  ... fallback ...
}

float foo (float f)
{
  return compute_sth (f);
}

and if you not only want to dispatch for target features you could
specify a selector function and value in the attribute.  You might
notice that the above eventually matches the target attribute
directly, just the frontends need to be taught to emit dispatch
code whenever overload resolution results in ambiguities involving
target attribute differences.

Now, a language extension to support multi-versioning should be
completely independent on any IL representation - with using
a builtin you are tying them together with the only convenient
mechanism we have - a mechanism that isn't optimal for either
side IMNSHO.

>>>> My point is that such optimization is completely independent of
>>>> that dispatch thing.  The above could as well be a selection to
>>>> use different input arrays, one causing alias analysis issues
>>>> and one not.
>>>>
>>>> Thus, a __builtin_dispatch centric optimization pass is the wrong
>>>> way to go.
>>>
>>> I agree that many things can implemented in different ways, but a high
>>> level standard builtin_dispatch mechanism doing hoisting
>>> interprocedcurally is cleaner and simpler and targeted for function
>>> multi-versioning -- it does not depend on/rely on later phase's
>>> heuristic tunings to make the right things to happen. Function MV
>>> deserves this level of treatment as it will become more and more
>>> important for some users (e.g., Google).
>>
>> But inventing a new language extension to benefit from whatever
>> improvements we implement isn't the obviously best choice.
>
> It is not for any improvement. I mentioned the potential for function
> MV and want to have a compiler infrastructure to deal with it.
>
>
>>
>>>> Now, with FDO I'd expect the foo is inlined into bar (because foo
>>>> is deemed hot),
>>>
>>> That is a myth -- the truth is that there are other heuristics which
>>> can prevent this from happening.
>>
>> Huh, sure.  That doesn't make my expectation a myth.
>>
>>>> then you only need to deal with loop unswitching,
>>>> which should be easy to drive from FDO.
>>>
>>> Same here -- the loop body may not be well formed/recognized. The loop
>>> nests may not be perfectly nested, or other unswitching heuristics may
>>> block it from happening.  This is the common problem form many other
>>> things that get lowered too early. It is cleaner to make the high
>>> level transformation first in IPA, and let unswitching dealing with
>>> intra-procedural optimization.
>>
>> If it's not well-formed inlining the call does not make it well-formed and
>> thus it won't be optimized well anyway.  Btw, for the usual cases I
>> have seen inlining isn't a transform that is worthwhile - transforming
>> to IFUNC would have been.
>
> I am not sure I understand the comment here. The proposed approach can
> do interprocedural hoisting of the dispatch and this done pretty early
> in the pipeline so that hot functions can be optimized as much as
> possible. Lowering it early in the hot functions rely later phases to
> deal with it has the following limitations:
>
> 0) lowered code can get in the way of effective optimization
> 1) the lowered code can be transformed such that it can not be
> unswitched or can not be raised properly
> 2) all functions in the related call chain to be inlined for the
> unswitching to happen interprocedurally
> 3) relies on unswitching heuristic to kick in

Sure, we have such issues everywhere in the compiler.  I don't see why
MV is special.  In fact there are surely cases that can be constructed
where we rely on earlier optimizations to make MV transforms possible.

Pass ordering issues are independent on design of a language extension
and independent on what canonical IL representation we want to have
to help MV related optimizations.

Thanks,
Richard.

Re: [google] Patch to support calling multi-versioned functions via new GCC builtin. (issue4440078)

Reply via email to