GCC built with latest binutils and the patch give the following performance improve: spec2000INT +3% at "-O2 -m32", +1,5% at "-O2 -m64".
Some other benchmark scores at "-O2" were also improved up to 6%. The patch is very efficient for PIE mode. Thanks, Evgeny On Tue, May 5, 2015 at 6:30 PM, H.J. Lu <hjl.to...@gmail.com> wrote: > On Wed, Apr 22, 2015 at 9:34 AM, H.J. Lu <hongjiu...@intel.com> wrote: >> Normally, with PIE, GCC accesses globals that are extern to the module >> using GOT. This is two instructions, one to get the address of the global >> from GOT and the other to get the value. Examples: >> >> --- >> extern int a_glob; >> int >> main () >> { >> return a_glob; >> } >> --- >> >> With PIE, the generated code accesses global via GOT using two memory >> loads: >> >> movq a_glob@GOTPCREL(%rip), %rax >> movl (%rax), %eax >> >> for 64-bit or >> >> movl a_glob@GOT(%ecx), %eax >> movl (%eax), %eax >> >> for 32-bit. >> >> Some experiments on google and SPEC CPU benchmarks show that the extra >> instruction affects performance by 1% to 5%. >> >> Solution - Copy Relocations: >> >> When the linker supports copy relocations, GCC can always assume that >> the global will be defined in the executable. For globals that are >> truly extern (come from shared objects), the linker will create copy >> relocations and have them defined in the executable. Result is that >> no global access needs to go through GOT and hence improves performance. >> We can generate >> >> movl a_glob(%rip), %eax >> >> for 64-bit and >> >> movl a_glob@GOTOFF(%eax), %eax >> >> for 32-bit. This optimization only applies to undefined non-weak >> non-TLS global data. Undefined weak global or TLS data access still >> must go through GOT. >> >> This patch reverts legitimate_pic_address_disp_p change made in revision >> 218397, which only applies to x86-64. Instead, this patch updates >> targetm.binds_local_p to indicate if undefined non-weak non-TLS global >> data is defined locally in PIE. It also introduces a new target hook, >> binds_tls_local_p to distinguish TLS variable from non-TLS variable. By >> default, binds_tls_local_p is the same as binds_local_p. >> >> This patch checks if 32-bit and 64-bit linkers support PIE with copy >> reloc at configure time. 64-bit linker is enabled in binutils 2.25 >> and 32-bit linker is enabled in binutils 2.26. This optimization >> is enabled only if the linker support is available. >> >> Tested on Linux/x86-64 with -m32 and -m64, using linkers with and without >> support for copy relocation in PIE. OK for trunk? >> >> Thanks. >> >> H.J. >> --- >> gcc/ >> >> PR target/65846 >> * configure.ac (HAVE_LD_PIE_COPYRELOC): Renamed to ... >> (HAVE_LD_64BIT_PIE_COPYRELOC): This. >> (HAVE_LD_32BIT_PIE_COPYRELOC): New. Defined to 1 if Linux/ia32 >> linker supports PIE with copy reloc. >> * output.h (default_binds_tls_local_p): New. >> (default_binds_local_p_3): Add 2 bool arguments. >> * target.def (binds_tls_local_p): New target hook. >> * varasm.c (decl_default_tls_model): Replace targetm.binds_local_p >> with targetm.binds_tls_local_p. >> (default_binds_local_p_3): Add a bool argument to indicate TLS >> variable and a bool argument to indicate if an undefined non-TLS >> non-weak data is local. Double check TLS variable. If an >> undefined non-TLS non-weak data is local, treat it as defined >> locally. >> (default_binds_local_p): Pass false and false to >> default_binds_local_p_3. >> (default_binds_local_p_2): Likewise. >> (default_binds_local_p_1): Likewise. >> (default_binds_tls_local_p): New. >> * config.in: Regenerated. >> * configure: Likewise. >> * doc/tm.texi: Likewise. >> * config/i386/i386.c (legitimate_pic_address_disp_p): Don't >> check HAVE_LD_PIE_COPYRELOC here. >> (ix86_binds_local): New. >> (ix86_binds_tls_local_p): Likewise. >> (ix86_binds_local_p): Use it. >> (TARGET_BINDS_TLS_LOCAL_P): New. >> * doc/tm.texi.in (TARGET_BINDS_TLS_LOCAL_P): New hook. >> >> gcc/testsuite/ >> >> PR target/65846 >> * gcc.target/i386/pie-copyrelocs-1.c: Updated for ia32. >> * gcc.target/i386/pie-copyrelocs-2.c: Likewise. >> * gcc.target/i386/pie-copyrelocs-3.c: Likewise. >> * gcc.target/i386/pie-copyrelocs-4.c: Likewise. >> * gcc.target/i386/pr32219-9.c: Likewise. >> * gcc.target/i386/pr32219-10.c: New file. >> >> * lib/target-supports.exp (check_effective_target_pie_copyreloc): >> Check HAVE_LD_64BIT_PIE_COPYRELOC and HAVE_LD_32BIT_PIE_COPYRELOC >> instead of HAVE_LD_64BIT_PIE_COPYRELOC. > > Richard, Jeff, > > Can you review this patch: > > https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01331.html > > Thanks. > > > > -- > H.J.