On 12-10-10 10:53 AM, Steven Bosscher wrote:
On Thu, Oct 4, 2012 at 5:37 PM, Vladimir Makarov wrote:
   The following patch solves most of LRA scalability problems.

   It switches on simpler algorithms in LRA.  The first it switches off
trying to reassign hard registers to spilled pseudos (they usually for such
huge functions have long live ranges -- so the possibility to assign
them something very small but trying to reassign them a hard registers
is to expensive), inheritance, live range splitting, and memory
coalescing optimizations.  It seems that rematerialization is too
important for performance -- so I don't switch it off.  As splitting is
also necessary for generation of caller saves code, I switch off
caller-saves in IRA and force IRA to do non-regional RA.
Hi Vlad,

I've revisited this patch now that parts of the scalability issues
have been resolved. Something funny happened for our
soon-to-be-legendary PR54146 test case...

lra-branch yesterday (i.e. without the elimination and constraints
speedup patches):
  integrated RA           : 145.26 (18%)
  LRA non-specific        :  46.94 ( 6%)
  LRA virtuals elimination:  51.56 ( 6%)
  LRA reload inheritance  :   0.03 ( 0%)
  LRA create live ranges  :  46.67 ( 6%)
  LRA hard reg assignment :   0.55 ( 0%)

lra-branch today + ira-speedup-1.diff:
  integrated RA           : 111.19 (15%) usr
  LRA non-specific        :  21.16 ( 3%) usr
  LRA virtuals elimination:   0.65 ( 0%) usr
  LRA reload inheritance  :   0.01 ( 0%) usr
  LRA create live ranges  :  56.33 ( 8%) usr
  LRA hard reg assignment :   0.58 ( 0%) usr

lra-branch today + ira-speedup-1.diff + rm-lra_simple_p.diff:
  integrated RA           :  89.43 (11%) usr
  LRA non-specific        :  21.43 ( 3%) usr
  LRA virtuals elimination:   0.61 ( 0%) usr
  LRA reload inheritance  :   6.10 ( 1%) usr
  LRA create live ranges  :  88.64 (11%) usr
  LRA hard reg assignment :  45.17 ( 6%) usr
  LRA coalesce pseudo regs:   2.24 ( 0%) usr

Note how IRA is *faster* without the lra_simple_p patch. The cost
comes back in "LRA hard reg assignment" and "LRA create live ranges"
where I assume the latter is a consequence of running
lra_create_live_ranges a few more times to work for the hard-reg
assignment phase.

Do you have an idea why IRA might be faster without the lra_simple_p
thing? Maybe there's a way to get the best of both...

  I have no idea.

  I can not confirm it on an Intel Corei7 machine.  Here is my timing.
Removing lra_simple_p makes the worst compilation time, but the best
code size.

  It is also interesting that your IRA range patch results in
different code generation (i can not explain it too now). I saw the same
on a small test (black jack playing and betting strategy).

  Another interesting thing is that IRA times are the same (with and
without simplified allocation for LRA).

--- branch this morning
integrated RA : 48.41 (13%) usr 0.25 ( 3%) sys 48.72 (13%) wall 223608 kB (19%) ggc LRA non-specific : 14.47 ( 4%) usr 0.15 ( 2%) sys 14.57 ( 4%) wall 41443 kB ( 4%) ggc LRA virtuals elimination: 0.40 ( 0%) usr 0.00 ( 0%) sys 0.41 ( 0%) wall 36037 kB ( 3%) ggc LRA reload inheritance : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 1209 kB ( 0%) ggc LRA create live ranges : 17.37 ( 5%) usr 0.21 ( 3%) sys 17.56 ( 5%) wall 5182 kB ( 0%) ggc LRA hard reg assignment : 1.77 ( 0%) usr 0.02 ( 0%) sys 1.76 ( 0%) wall 0 kB ( 0%) ggc LRA coalesce pseudo regs: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc real=377.25 user=367.58 system=8.36 share=99%% maxrss=33540720 ins=280 outs=92544 mfaults=4448012 waits=17
   text    data     bss     dec     hex filename
6395340      16     607 6395963  61983b s.o
--- branch this morning + ira range patch
integrated RA : 36.03 (10%) usr 0.03 ( 0%) sys 36.20 (10%) wall 223608 kB (19%) ggc LRA non-specific : 14.57 ( 4%) usr 0.14 ( 2%) sys 14.89 ( 4%) wall 41453 kB ( 4%) ggc LRA virtuals elimination: 0.36 ( 0%) usr 0.01 ( 0%) sys 0.41 ( 0%) wall 36040 kB ( 3%) ggc LRA reload inheritance : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 1210 kB ( 0%) ggc LRA create live ranges : 17.36 ( 5%) usr 0.21 ( 3%) sys 17.53 ( 5%) wall 5184 kB ( 0%) ggc LRA hard reg assignment : 1.78 ( 1%) usr 0.02 ( 0%) sys 1.79 ( 0%) wall 0 kB ( 0%) ggc LRA coalesce pseudo regs: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 351.82 7.50 360.52 1149460 kB real=362.68 user=353.65 system=7.84 share=99%% maxrss=33540432 ins=224 outs=92544 mfaults=4073281 waits=17
   text    data     bss     dec     hex filename
6395424      16     607 6396047  61988f s.o
--- branch this morning + ira range patch + removing lra_simple_p
integrated RA : 37.87 ( 9%) usr 0.14 ( 2%) sys 38.30 ( 9%) wall 744114 kB (45%) ggc LRA non-specific : 13.52 ( 3%) usr 0.05 ( 1%) sys 13.60 ( 3%) wall 39171 kB ( 2%) ggc LRA virtuals elimination: 0.38 ( 0%) usr 0.01 ( 0%) sys 0.40 ( 0%) wall 33096 kB ( 2%) ggc LRA reload inheritance : 3.31 ( 1%) usr 0.00 ( 0%) sys 3.36 ( 1%) wall 5217 kB ( 0%) ggc LRA create live ranges : 39.75 (10%) usr 0.42 ( 5%) sys 40.53 (10%) wall 5694 kB ( 0%) ggc LRA hard reg assignment : 31.87 ( 8%) usr 0.03 ( 0%) sys 31.94 ( 8%) wall 0 kB ( 0%) ggc LRA coalesce pseudo regs: 1.14 ( 0%) usr 0.00 ( 0%) sys 1.15 ( 0%) wall 0 kB ( 0%) ggc real=424.69 user=414.47 system=8.06 share=99%% maxrss=36546048 ins=34992 outs=91528 mfaults=4253004 waits=175
   text    data     bss     dec     hex filename
6278007      16     607 6278630  5fcde6 s.o

Reply via email to