Re: RFC: LRA for x86/x86-64 [0/9]

Steven Bosscher Thu, 04 Oct 2012 11:44:00 -0700

On Sat, Sep 29, 2012 at 10:26 PM, Steven Bosscher <stevenb....@gmail.com> wrote:
> To put it in another perspective, here are my timings of trunk vs lra
> (both checkouts done today):
>
> trunk:
>  integrated RA           : 181.68 (24%) usr   1.68 (11%) sys 183.43
> (24%) wall  643564 kB (20%) ggc
>  reload                  :  11.00 ( 1%) usr   0.18 ( 1%) sys  11.17 (
> 1%) wall   32394 kB ( 1%) ggc
>  TOTAL                 : 741.64            14.76           756.41
>       3216164 kB
>
> lra branch:
>  integrated RA           : 174.65 (16%) usr   1.33 ( 8%) sys 176.33
> (16%) wall  643560 kB (20%) ggc
>  reload                  : 399.69 (36%) usr   2.48 (15%) sys 402.69
> (36%) wall   41852 kB ( 1%) ggc
>  TOTAL                 :1102.06            16.05          1120.83
>       3231738 kB
>
> That's a 49% slowdown. The difference is completely accounted for by
> the timing difference between reload and LRA.


With Vlad's patch to switch off expensive LRA parts for extreme
functions ([lra revision 192093]), the numbers are:

 integrated RA           : 154.27 (17%) usr   1.27 ( 8%) sys 155.64
(17%) wall  131534 kB ( 5%) ggc
 LRA non-specific        :  69.67 ( 8%) usr   0.79 ( 5%) sys  70.40 (
8%) wall   18805 kB ( 1%) ggc
 LRA virtuals elimination:  55.53 ( 6%) usr   0.00 ( 0%) sys  55.49 (
6%) wall   20465 kB ( 1%) ggc
 LRA reload inheritance  :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 (
0%) wall      57 kB ( 0%) ggc
 LRA create live ranges  :  80.46 ( 4%) usr   1.05 ( 6%) sys  81.49 (
4%) wall    2459 kB ( 0%) ggc
 LRA hard reg assignment :   1.78 ( 0%) usr   0.05 ( 0%) sys   1.85 (
0%) wall       0 kB ( 0%) ggc
 reload                  :   6.38 ( 1%) usr   0.13 ( 1%) sys   6.51 (
1%) wall       0 kB ( 0%) ggc
 TOTAL                 : 917.42            16.35           933.78
      2720151 kB

Recalling trunk total time (r191835):

>  TOTAL                 : 741.64            14.76           756.41

the slowdown due to LRA is down from 49% to 23%, with still room for
improvement (even without crippling LRA further). Size with the
expensive LRA parts switched off is still better thank trunk:
$ size slow.o*
   text    data     bss     dec     hex filename
3499938       8     583 3500529  3569f1 slow.o.00_trunk_r191835
3386117       8     583 3386708  33ad54 slow.o.01_lra_r191626
3439755       8     583 3440346  347eda slow.o.02_lra_r192093

The lra-branch outperforms trunk on everything else I've thrown at it,
in terms of compile time and code size at least, and also e.g. on
Fortran polyhedron runtime.

Ciao!
Steven

Re: RFC: LRA for x86/x86-64 [0/9]

Reply via email to