the patch improves performance when previous are applied.
It makes RTL loop invariant behavior for GOT loads same as it was
before the 2 previous patches.

It improves 164.gzip (+9%), 253.perlbmk (+2%) giving ~0.5% to SPEC2000int
(compiled with “-m32 -Ofast -flto -funroll-loops -fPIC”

For example in 164.gzip.

Before enabling EBX:
loop begin:

(I) 1. SI162 = prev (global address)
    2. SI163 = SI142 & 0xfff (SI 142 modified in the loop)
    3. SI164 = EBX + SI162
    4. HI107 = SI163*2 + SI164
    5. SI142 = HI107

Only INSN 1. treated as invariant and later combine propagates 2,3,4 into 5.

After enabling EBX:
loop begin:

(I) 1. SI163 = prev (global address)
    2. SI164 = SI142 & 0xfff (SI 142 modified in the loop)
(I) 3. SI165 = SI143 + SI163
    4. HI107 = SI164*2 + SI165
    5. SI142 = HI107

INSNs 1. and 3. are treated as invariants (143 is GOT register) and
hoisted outside the loop

After that combine pass was unable to combine INSNs inside and outside
the loop, which lead to higher register pressure and therefore new
spills/fills.

The patch fixes x86 address cost so that cost for addresses with GOT
register becomes less, how it was before enabling EBX.

In x86_address_cost the result of “REGNO (parts.base) >=
FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes
condition result
the same when parts.base is GOT register (the same for parts.index).

2014-10-08  Evgeny Stupachenko  <evstu...@gmail.com>
        * gcc/config/i386/i386.c (ix86_address_cost): Lower cost for
        when address contains GOT register.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b43e870..9d8cfd1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode,
addr_space_t, bool)
     cost++;

   if (parts.base
+      && (!pic_offset_table_rtx
+         || REGNO (pic_offset_table_rtx) != REGNO(parts.base))
       && (!REG_P (parts.base) || REGNO (parts.base) >= FIRST_PSEUDO_REGISTER)
       && parts.index
+      && (!pic_offset_table_rtx
+         || REGNO (pic_offset_table_rtx) != REGNO(parts.index))
       && (!REG_P (parts.index) || REGNO (parts.index) >= FIRST_PSEUDO_REGISTER)
       && parts.base != parts.index)
     cost++;

Reply via email to