Re: [lra] patch to solve most scalability problems for LRA

Vladimir Makarov Wed, 10 Oct 2012 13:14:36 -0700

On 12-10-10 10:53 AM, Steven Bosscher wrote:

On Thu, Oct 4, 2012 at 5:37 PM, Vladimir Makarov wrote:

   The following patch solves most of LRA scalability problems.


   It switches on simpler algorithms in LRA.  The first it switches off
trying to reassign hard registers to spilled pseudos (they usually for such
huge functions have long live ranges -- so the possibility to assign
them something very small but trying to reassign them a hard registers
is to expensive), inheritance, live range splitting, and memory
coalescing optimizations.  It seems that rematerialization is too
important for performance -- so I don't switch it off.  As splitting is
also necessary for generation of caller saves code, I switch off
caller-saves in IRA and force IRA to do non-regional RA.

Hi Vlad,

I've revisited this patch now that parts of the scalability issues
have been resolved. Something funny happened for our
soon-to-be-legendary PR54146 test case...

lra-branch yesterday (i.e. without the elimination and constraints
speedup patches):
  integrated RA           : 145.26 (18%)
  LRA non-specific        :  46.94 ( 6%)
  LRA virtuals elimination:  51.56 ( 6%)
  LRA reload inheritance  :   0.03 ( 0%)
  LRA create live ranges  :  46.67 ( 6%)
  LRA hard reg assignment :   0.55 ( 0%)

lra-branch today + ira-speedup-1.diff:
  integrated RA           : 111.19 (15%) usr
  LRA non-specific        :  21.16 ( 3%) usr
  LRA virtuals elimination:   0.65 ( 0%) usr
  LRA reload inheritance  :   0.01 ( 0%) usr
  LRA create live ranges  :  56.33 ( 8%) usr
  LRA hard reg assignment :   0.58 ( 0%) usr

lra-branch today + ira-speedup-1.diff + rm-lra_simple_p.diff:
  integrated RA           :  89.43 (11%) usr
  LRA non-specific        :  21.43 ( 3%) usr
  LRA virtuals elimination:   0.61 ( 0%) usr
  LRA reload inheritance  :   6.10 ( 1%) usr
  LRA create live ranges  :  88.64 (11%) usr
  LRA hard reg assignment :  45.17 ( 6%) usr
  LRA coalesce pseudo regs:   2.24 ( 0%) usr

Note how IRA is *faster* without the lra_simple_p patch. The cost
comes back in "LRA hard reg assignment" and "LRA create live ranges"
where I assume the latter is a consequence of running
lra_create_live_ranges a few more times to work for the hard-reg
assignment phase.

Do you have an idea why IRA might be faster without the lra_simple_p
thing? Maybe there's a way to get the best of both...

  I have no idea.

  I can not confirm it on an Intel Corei7 machine.  Here is my timing.
Removing lra_simple_p makes the worst compilation time, but the best
code size.

  It is also interesting that your IRA range patch results in
different code generation (i can not explain it too now). I saw the same
on a small test (black jack playing and betting strategy).

  Another interesting thing is that IRA times are the same (with and
without simplified allocation for LRA).

--- branch this morning

integrated RA : 48.41 (13%) usr 0.25 ( 3%) sys 48.72(13%) wall 223608 kB (19%) ggcLRA non-specific : 14.47 ( 4%) usr 0.15 ( 2%) sys 14.57 (4%) wall 41443 kB ( 4%) ggcLRA virtuals elimination: 0.40 ( 0%) usr 0.00 ( 0%) sys 0.41 (0%) wall 36037 kB ( 3%) ggcLRA reload inheritance : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.15 (0%) wall 1209 kB ( 0%) ggcLRA create live ranges : 17.37 ( 5%) usr 0.21 ( 3%) sys 17.56 (5%) wall 5182 kB ( 0%) ggcLRA hard reg assignment : 1.77 ( 0%) usr 0.02 ( 0%) sys 1.76 (0%) wall 0 kB ( 0%) ggcLRA coalesce pseudo regs: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 (0%) wall 0 kB ( 0%) ggcreal=377.25 user=367.58 system=8.36 share=99%% maxrss=33540720 ins=280outs=92544 mfaults=4448012 waits=17

   text    data     bss     dec     hex filename
6395340      16     607 6395963  61983b s.o
--- branch this morning + ira range patch

integrated RA : 36.03 (10%) usr 0.03 ( 0%) sys 36.20(10%) wall 223608 kB (19%) ggcLRA non-specific : 14.57 ( 4%) usr 0.14 ( 2%) sys 14.89 (4%) wall 41453 kB ( 4%) ggcLRA virtuals elimination: 0.36 ( 0%) usr 0.01 ( 0%) sys 0.41 (0%) wall 36040 kB ( 3%) ggcLRA reload inheritance : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.15 (0%) wall 1210 kB ( 0%) ggcLRA create live ranges : 17.36 ( 5%) usr 0.21 ( 3%) sys 17.53 (5%) wall 5184 kB ( 0%) ggcLRA hard reg assignment : 1.78 ( 1%) usr 0.02 ( 0%) sys 1.79 (0%) wall 0 kB ( 0%) ggcLRA coalesce pseudo regs: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 (0%) wall 0 kB ( 0%) ggcTOTAL : 351.82 7.50 360.521149460 kBreal=362.68 user=353.65 system=7.84 share=99%% maxrss=33540432 ins=224outs=92544 mfaults=4073281 waits=17

   text    data     bss     dec     hex filename
6395424      16     607 6396047  61988f s.o
--- branch this morning + ira range patch + removing lra_simple_p

integrated RA : 37.87 ( 9%) usr 0.14 ( 2%) sys 38.30 (9%) wall 744114 kB (45%) ggcLRA non-specific : 13.52 ( 3%) usr 0.05 ( 1%) sys 13.60 (3%) wall 39171 kB ( 2%) ggcLRA virtuals elimination: 0.38 ( 0%) usr 0.01 ( 0%) sys 0.40 (0%) wall 33096 kB ( 2%) ggcLRA reload inheritance : 3.31 ( 1%) usr 0.00 ( 0%) sys 3.36 (1%) wall 5217 kB ( 0%) ggcLRA create live ranges : 39.75 (10%) usr 0.42 ( 5%) sys 40.53(10%) wall 5694 kB ( 0%) ggcLRA hard reg assignment : 31.87 ( 8%) usr 0.03 ( 0%) sys 31.94 (8%) wall 0 kB ( 0%) ggcLRA coalesce pseudo regs: 1.14 ( 0%) usr 0.00 ( 0%) sys 1.15 (0%) wall 0 kB ( 0%) ggcreal=424.69 user=414.47 system=8.06 share=99%% maxrss=36546048 ins=34992outs=91528 mfaults=4253004 waits=175

   text    data     bss     dec     hex filename
6278007      16     607 6278630  5fcde6 s.o

Re: [lra] patch to solve most scalability problems for LRA

Reply via email to