Re: [PATCH] Don't combine param and return value copies

Segher Boessenkool Thu, 28 May 2015 08:22:47 -0700

On Tue, May 26, 2015 at 04:37:46PM +0930, Alan Modra wrote:
> On powerpc64le, modifying the way combine treats function parameters
> and call arguments results in some regressions.
> 
> For instance, this testcase from varasm.c
> 
> extern int foo3 (void *, ...);
> extern void foo4 (void *, const char *);
> int
> emit_tls_common (void *decl,
>                const char *name,
>                unsigned long size)
> {
>   foo3 (0, "\t%s\t", "..");
>   foo4 (0, name);
>   foo3 (0, ",%lu,%u\n", size, ((unsigned int *)decl)[88] / 8);
>   return 1;
> }
> 
> at -O2 produces for the prologue and first call
> 
> old                           new
>       mflr 0                          mflr 0
>       std 29,-24(1)                   std 29,-24(1)
>       std 30,-16(1)                   std 30,-16(1)
>       mr 29,4                         addis 9,2,.LC0@toc@ha
>       std 31,-8(1)                    std 31,-8(1)
>       addis 4,2,.LC1@toc@ha           addis 10,2,.LC1@toc@ha
>       mr 31,5                         addi 9,9,.LC0@toc@l
>       addis 5,2,.LC0@toc@ha           addi 10,10,.LC1@toc@l
>       mr 30,3                         mr 30,3
>       addi 5,5,.LC0@toc@l             mr 29,4
>       addi 4,4,.LC1@toc@l             mr 31,5
>       li 3,0                          mr 4,10
>       std 0,16(1)                     mr 5,9
>       stdu 1,-128(1)                  std 0,16(1)
>       bl foo3                         stdu 1,-128(1)
>       nop                             li 3,0
>                                       bl foo3
>                                       nop
> 
> As you can see, we have some extra register shuffling from keeping a
> pseudo for arg setup insns.  I guess the pseudos allow sched more
> freedom to mess around..


... and then RA isn't able to move things back.  I see this happening
with all three changes (return value, incoming args, outgoing args);
the changes to combine give sched1 and RA more freedom, but those then
end up generating lots of unnecessary register moves.

> On the positive side, I saw cases where keeping parameter pseudos
> allowed shrink-wrap to occur.  varasm.c:decode_reg_name_and_count is
> one of them.  More shrink-wrapping is a big win.
> 
> Here's a case where changes at the return result in poorer code
> int
> decl_readonly_section_1 (int category)
> {
>   switch (category)
>     {
>     case 1:
>     case 2:
>     case 3:
>     case 4:
>     case 5:
>       return 1;
>     default:
>       return 0;
>     }
> }
> old                   new
>       addi 9,3,-6             addi 9,3,-6
>       neg 3,3                 neg 3,3
>       and 3,9,3               and 3,9,3
>       rldicl 3,3,33,63        srwi 3,3,31
>       blr                     rldicl 3,3,0,32
>                               blr
> 
> Previously this:
> (insn 35 34 36 2 (set (reg:SI 161)
>         (lshiftrt:SI (reg:SI 164)
>             (const_int 31 [0x1f]))) {lshrsi3})
> (insn 36 35 23 2 (set (reg:DI 155 [ D.2441 ])
>         (zero_extend:DI (reg:SI 161))) {zero_extendsidi2})
> (insn 23 36 24 2 (set (reg/i:DI 3 3)
>         (reg:DI 155 [ D.2441 ])) {*movdi_internal64})
> 
> is first combined to
> (insn 35 34 36 2 (set (reg:SI 161)
>         (lshiftrt:SI (reg:SI 164)
>             (const_int 31 [0x1f]))) {lshrsi3})
> (insn 23 35 24 2 (set (reg/i:DI 3 3)
>       (and:DI (subreg:DI (reg:SI 161) 0)
>           (const_int 1 [0x1]))))
> which is somewhat surprising, but from my previous forays into
> combine.c I'd say happens due to known zero bits.  (Just looking at
> dumps here, rather than in gdb.)
> 
> Then the above is further combined to
> (insn 23 34 24 2 (set (reg/i:DI 3 3)
>       (zero_extract:DI (subreg:DI (reg:SI 164) 0)
>           (const_int 1 [0x1])
>           (const_int 32 [0x20])))
> 
> Looks to me like a missed optimization opportunity that insns 35 and
> 36 aren't combined without first going through the intermediate step.

The rs6000 backend doesn't have zero_extend variants of many of its
patterns, only some.  Well-known problem, long-term project.

> Anyway, here's the rewritten patch.  I've left in some knobs I used
> when testing in case you want to see for yourself what happens with
> various options.  Bootstrapped etc. powerpc64le-linux and
> x86_64-linux.

> +#define DONT_COMBINE_PARAMS 1
> +#define DONT_COMBINE_CALL_ARGS 1

I tested with all combinations of those knob settings, building Linux
kernels (mostly defconfigs); these are the resulting text sizes:

  master   alan00   alan10   alan01   alan11
 5432728  5432728  5433848  5435472  5436080  alpha
 3851131  3851391  3852495  3852567  3853755  arm
 2190716  2190716  2190716  2190708  2190708  blackfin
 2191439  2191503  2191983  2192335  2192751  c6x
 2213186  2213250  2213154  2213482  2213546  cris
 3322420  3322420  3322500  3322564  3322692  frv
10898664 10898664 10898664 10898664 10898664  i386
 3253459  3253539  3253599  3255235  3255331  m32r
 4708528  4708532  4709772  4708660  4709656  microblaze
 3949689  3949745  3950269  3952401  3952857  mips
 5693823  5693975  5694443  5697971  5698227  mips64
 2374664  2374708  2375126  2375485  2375841  mn10300
 7488219  7488419  7492299  7489743  7493431  parisc
 6195727  6195935  6196367  6221647  6221687  parisc64
 8688975  8689111  8691931  8692619  8695279  powerpc
20252077 20252337 20255185 20266897 20269477  powerpc64
11423734 11421858 11423946 11422726 11424838  s390
 6488342  6488406  6488534  6489110  6489302  sh64
 1545652  1545652  1545776  1545812  1545944  shnommu
 3737005  3736973  3737357  3737585  3737945  sparc
 6075342  6074426  6074762  6075570  6075834  sparc64
12301607 12301607 12301543 12301787 12301723  x86_64
 1954029  1954061  1954441  1948841  1949305  xtensa

As you see, almost everything regresses code size.  s390 and sparc and
xtensa like some of it; everything else generates more register moves.
A typical one for powerpc, with the "00" setting:

before              after
rlwinm 3,9,N,N,N    rlwinm 9,9,N,N,N
                    mr 3,9

(and then blr etc., no further uses of r3 or r9; nothing special about
rlwinm here, there also are "and" and "addi" cases, etc.)

I think we should at least ameliorate this regression before we can
apply any variant of this :-(


Segher

Re: [PATCH] Don't combine param and return value copies

Reply via email to