Hi Uros, Could you please review this patch?
Thanks Sri On Fri, Jun 20, 2014 at 5:17 PM, Sriraman Tallam <tmsri...@google.com> wrote: > Patch Updated. > > Sri > > On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam <tmsri...@google.com> wrote: >> Ping. >> >> On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam <tmsri...@google.com> >> wrote: >>> Ping. >>> >>> On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsri...@google.com> >>> wrote: >>>> Optimize access to globals with -fpie, x86_64 only: >>>> >>>> Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the >>>> module >>>> using the GOT. This is two instructions, one to get the address of the >>>> global >>>> from the GOT and the other to get the value. If it turns out that the >>>> global >>>> gets defined in the executable at link-time, it still needs to go through >>>> the >>>> GOT as it is too late then to generate a direct access. >>>> >>>> Examples: >>>> >>>> foo.cc >>>> ------ >>>> int a_glob; >>>> int main () { >>>> return a_glob; // defined in this file >>>> } >>>> >>>> With -O2 -fpie -pie, the generated code directly accesses the global via >>>> PC-relative insn: >>>> >>>> 5e0 <main>: >>>> mov 0x165a(%rip),%eax # 1c40 <a_glob> >>>> >>>> foo.cc >>>> ------ >>>> >>>> extern int a_glob; >>>> int main () { >>>> return a_glob; // defined in this file >>>> } >>>> >>>> With -O2 -fpie -pie, the generated code accesses global via GOT using two >>>> memory loads: >>>> >>>> 6f0 <main>: >>>> mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230> >>>> mov (%rax),%eax >>>> >>>> This is true even if in the latter case the global was defined in the >>>> executable through a different file. >>>> >>>> Some experiments on google benchmarks shows that the extra memory loads >>>> affects >>>> performance by 1% to 5%. >>>> >>>> >>>> Solution - Copy Relocations: >>>> >>>> When the linker supports copy relocations, GCC can always assume that the >>>> global will be defined in the executable. For globals that are truly >>>> extern >>>> (come from shared objects), the linker will create copy relocations and >>>> have >>>> them defined in the executable. Result is that no global access needs to go >>>> through the GOT and hence improves performance. >>>> >>>> This patch to the gold linker : >>>> https://sourceware.org/ml/binutils/2014-05/msg00092.html >>>> submitted recently allows gold to generate copy relocations for -pie mode >>>> when >>>> necessary. >>>> >>>> I have added option -mld-pie-copyrelocs which when combined with -fpie >>>> would do >>>> this. Note that the BFD linker does not support pie copyrelocs yet and >>>> this >>>> option cannot be used there. >>>> >>>> Please review. >>>> >>>> >>>> ChangeLog: >>>> >>>> * config/i386/i36.opt (mld-pie-copyrelocs): New option. >>>> * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this >>>> address is still legitimate in the presence of copy relocations >>>> and -fpie. >>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. >>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. >>>> >>>> >>>> >>>> Patch attached. >>>> Thanks >>>> Sri