On Thu, Jul 7, 2011 at 5:47 PM, Michael Meissner <meiss...@linux.vnet.ibm.com> wrote: > On Thu, Jul 07, 2011 at 10:59:36AM +0200, Richard Guenther wrote: >> On Thu, Jul 7, 2011 at 12:29 AM, Michael Meissner >> <meiss...@linux.vnet.ibm.com> wrote: >> > This patch adds an option to not load the static chain (r11) for 64-bit >> > PowerPC >> > calls through function pointers (or virtual function). Most of the >> > languages >> > on the PowerPC do not need the static chain being loaded when called, and >> > adding this instruction can slow down code that calls very short functions. >> > >> > In addition, if the function does not call alloca, setjmp or deal with >> > exceptions where the stack is modified, the compiler can move the store of >> > the >> > TOC value for the current function to the prologue of the function, rather >> > than >> > at each call site. >> > >> > The effect of these patches is to speed up 464.h264ref in the Spec 2006 >> > benchmark by about 7% if -mno-r11 is used, and 5% if it is not used (but >> > the >> > save of the TOC register is hoisted). I believe this is due to the load >> > of the >> > current function's TOC (r2) having to wait until the store queue is drained >> > with the store just before the call. >> > >> > Unfortunately, I do see a 3% slowdown in 429.mcf, which I don't know what >> > the >> > cause is. >> > >> > I have bootstraped the compiler and saw that there were no regressions in >> > make >> > check. Is it ok to install in the trunk? >> >> Hum. Can't the compiler figure this our itself per-call-site? At least >> the name of the command-line switch -m[no-]r11 is meaningless to me. >> Points-to information should be able to tell you if the function pointer >> points to a nested function. > > No, the compiler cannot figure it out. Consider the case where a function is > passed a pointer to a function, such as the standard library function qsort. > The call may come from any random module, that isn't part of the compilation > suite, such as if the function being passed the pointer is in a shared > library. > You don't know whether the function pointed to uses the static chain > (i.e. nested function call with trampoline, call to PL/I, or other language > that does use the static chain, which is part of the ABI). The point of the > switch is similar to -ffast-math where you say you are willing to ignore some > corner cases in the standard in order to get better performance.
Well, I guess you don't propose to build glibc with -mno-r11? The compiler certainly can't figure out in _all_ cases - but it should be able to handle most of the cases (with LTO even more cases) ok, no? I also wonder why loading a register is so expensive compared to the actual call ... > I certainly can call the switch -mno-static-chain, which is perhaps more > meaningful (at least to us compiler folk, I'm not sure static chain means much > to the normal programmer). Well, that's up to the target maintainers to decide, maybe -mno-nested-functions instead? Richard.