Ping.
On Thu, Jun 26, 2014 at 10:54 AM, Sriraman Tallam <tmsri...@google.com> wrote: > Hi Uros, > > Could you please review this patch? > > Thanks > Sri > > On Fri, Jun 20, 2014 at 5:17 PM, Sriraman Tallam <tmsri...@google.com> wrote: >> Patch Updated. >> >> Sri >> >> On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam <tmsri...@google.com> wrote: >>> Ping. >>> >>> On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam <tmsri...@google.com> >>> wrote: >>>> Ping. >>>> >>>> On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsri...@google.com> >>>> wrote: >>>>> Optimize access to globals with -fpie, x86_64 only: >>>>> >>>>> Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the >>>>> module >>>>> using the GOT. This is two instructions, one to get the address of the >>>>> global >>>>> from the GOT and the other to get the value. If it turns out that the >>>>> global >>>>> gets defined in the executable at link-time, it still needs to go through >>>>> the >>>>> GOT as it is too late then to generate a direct access. >>>>> >>>>> Examples: >>>>> >>>>> foo.cc >>>>> ------ >>>>> int a_glob; >>>>> int main () { >>>>> return a_glob; // defined in this file >>>>> } >>>>> >>>>> With -O2 -fpie -pie, the generated code directly accesses the global via >>>>> PC-relative insn: >>>>> >>>>> 5e0 <main>: >>>>> mov 0x165a(%rip),%eax # 1c40 <a_glob> >>>>> >>>>> foo.cc >>>>> ------ >>>>> >>>>> extern int a_glob; >>>>> int main () { >>>>> return a_glob; // defined in this file >>>>> } >>>>> >>>>> With -O2 -fpie -pie, the generated code accesses global via GOT using two >>>>> memory loads: >>>>> >>>>> 6f0 <main>: >>>>> mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230> >>>>> mov (%rax),%eax >>>>> >>>>> This is true even if in the latter case the global was defined in the >>>>> executable through a different file. >>>>> >>>>> Some experiments on google benchmarks shows that the extra memory loads >>>>> affects >>>>> performance by 1% to 5%. >>>>> >>>>> >>>>> Solution - Copy Relocations: >>>>> >>>>> When the linker supports copy relocations, GCC can always assume that the >>>>> global will be defined in the executable. For globals that are truly >>>>> extern >>>>> (come from shared objects), the linker will create copy relocations and >>>>> have >>>>> them defined in the executable. Result is that no global access needs to >>>>> go >>>>> through the GOT and hence improves performance. >>>>> >>>>> This patch to the gold linker : >>>>> https://sourceware.org/ml/binutils/2014-05/msg00092.html >>>>> submitted recently allows gold to generate copy relocations for -pie mode >>>>> when >>>>> necessary. >>>>> >>>>> I have added option -mld-pie-copyrelocs which when combined with -fpie >>>>> would do >>>>> this. Note that the BFD linker does not support pie copyrelocs yet and >>>>> this >>>>> option cannot be used there. >>>>> >>>>> Please review. >>>>> >>>>> >>>>> ChangeLog: >>>>> >>>>> * config/i386/i36.opt (mld-pie-copyrelocs): New option. >>>>> * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this >>>>> address is still legitimate in the presence of copy relocations >>>>> and -fpie. >>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. >>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. >>>>> >>>>> >>>>> >>>>> Patch attached. >>>>> Thanks >>>>> Sri