the patch improves performance when previous are applied. It makes RTL loop invariant behavior for GOT loads same as it was before the 2 previous patches.
It improves 164.gzip (+9%), 253.perlbmk (+2%) giving ~0.5% to SPEC2000int (compiled with “-m32 -Ofast -flto -funroll-loops -fPIC” For example in 164.gzip. Before enabling EBX: loop begin: (I) 1. SI162 = prev (global address) 2. SI163 = SI142 & 0xfff (SI 142 modified in the loop) 3. SI164 = EBX + SI162 4. HI107 = SI163*2 + SI164 5. SI142 = HI107 Only INSN 1. treated as invariant and later combine propagates 2,3,4 into 5. After enabling EBX: loop begin: (I) 1. SI163 = prev (global address) 2. SI164 = SI142 & 0xfff (SI 142 modified in the loop) (I) 3. SI165 = SI143 + SI163 4. HI107 = SI164*2 + SI165 5. SI142 = HI107 INSNs 1. and 3. are treated as invariants (143 is GOT register) and hoisted outside the loop After that combine pass was unable to combine INSNs inside and outside the loop, which lead to higher register pressure and therefore new spills/fills. The patch fixes x86 address cost so that cost for addresses with GOT register becomes less, how it was before enabling EBX. In x86_address_cost the result of “REGNO (parts.base) >= FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes condition result the same when parts.base is GOT register (the same for parts.index). 2014-10-08 Evgeny Stupachenko <evstu...@gmail.com> * gcc/config/i386/i386.c (ix86_address_cost): Lower cost for when address contains GOT register. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b43e870..9d8cfd1 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode, addr_space_t, bool) cost++; if (parts.base + && (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.base)) && (!REG_P (parts.base) || REGNO (parts.base) >= FIRST_PSEUDO_REGISTER) && parts.index + && (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.index)) && (!REG_P (parts.index) || REGNO (parts.index) >= FIRST_PSEUDO_REGISTER) && parts.base != parts.index) cost++;