On Thu, Jul 7, 2011 at 5:47 PM, Michael Meissner
<meiss...@linux.vnet.ibm.com> wrote:
> On Thu, Jul 07, 2011 at 10:59:36AM +0200, Richard Guenther wrote:
>> On Thu, Jul 7, 2011 at 12:29 AM, Michael Meissner
>> <meiss...@linux.vnet.ibm.com> wrote:
>> > This patch adds an option to not load the static chain (r11) for 64-bit 
>> > PowerPC
>> > calls through function pointers (or virtual function).  Most of the 
>> > languages
>> > on the PowerPC do not need the static chain being loaded when called, and
>> > adding this instruction can slow down code that calls very short functions.
>> >
>> > In addition, if the function does not call alloca, setjmp or deal with
>> > exceptions where the stack is modified, the compiler can move the store of 
>> > the
>> > TOC value for the current function to the prologue of the function, rather 
>> > than
>> > at each call site.
>> >
>> > The effect of these patches is to speed up 464.h264ref in the Spec 2006
>> > benchmark by about 7% if -mno-r11 is used, and 5% if it is not used (but 
>> > the
>> > save of the TOC register is hoisted).  I believe this is due to the load 
>> > of the
>> > current function's TOC (r2) having to wait until the store queue is drained
>> > with the store just before the call.
>> >
>> > Unfortunately, I do see a 3% slowdown in 429.mcf, which I don't know what 
>> > the
>> > cause is.
>> >
>> > I have bootstraped the compiler and saw that there were no regressions in 
>> > make
>> > check.  Is it ok to install in the trunk?
>>
>> Hum.  Can't the compiler figure this our itself per-call-site?  At least
>> the name of the command-line switch -m[no-]r11 is meaningless to me.
>> Points-to information should be able to tell you if the function pointer
>> points to a nested function.
>
> No, the compiler cannot figure it out.  Consider the case where a function is
> passed a pointer to a function, such as the standard library function qsort.
> The call may come from any random module, that isn't part of the compilation
> suite, such as if the function being passed the pointer is in a shared 
> library.
> You don't know whether the function pointed to uses the static chain
> (i.e. nested function call with trampoline, call to PL/I, or other language
> that does use the static chain, which is part of the ABI).  The point of the
> switch is similar to -ffast-math where you say you are willing to ignore some
> corner cases in the standard in order to get better performance.

Well, I guess you don't propose to build glibc with -mno-r11?  The compiler
certainly can't figure out in _all_ cases - but it should be able to handle
most of the cases (with LTO even more cases) ok, no?

I also wonder why loading a register is so expensive compared to the
actual call ...

> I certainly can call the switch -mno-static-chain, which is perhaps more
> meaningful (at least to us compiler folk, I'm not sure static chain means much
> to the normal programmer).

Well, that's up to the target maintainers to decide, maybe
-mno-nested-functions instead?

Richard.

Reply via email to