[patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])

Steven Bosscher Wed, 03 Oct 2012 08:36:11 -0700

On Wed, Oct 3, 2012 at 4:56 PM, Vladimir Makarov <[email protected]> wrote:
> On 12-10-03 3:13 AM, Steven Bosscher wrote:
>>
>> On Tue, Oct 2, 2012 at 3:42 PM, Richard Sandiford
>> <[email protected]> wrote:
>>>>
>>>> +/* Compress pseudo live ranges by removing program points where
>>>> +   nothing happens.  Complexity of many algorithms in LRA is linear
>>>> +   function of program points number.  To speed up the code we try to
>>>> +   minimize the number of the program points here.  */
>>>> +static void
>>>> +remove_some_program_points_and_update_live_ranges (void)
>>>
>>> Genuine question, but could we do this on the fly instead,
>>> by not incrementing curr_point if the current point had no value?
>>>
>>> I suppose the main complication would be checking cases where
>>> all births are recorded by extending the previous just-closed live
>>> range rather than starting a new one, in which case it's the previous
>>> point that needs to be reused.  Hmm...
>>
>> It does seem to be something worth investigating further. Things like:
>>
>> Compressing live ranges: from 1742579 to 554532 - 31%
>> Compressing live ranges: from 1742569 to 73069 - 4%
>>
>> look extreme, but they're actually the norm. For the same test case
>> (PR54146 still) but looking only at functions with initially 1000-9999
>> program points, you get this picture:
>>
>> Compressing live ranges: from 1766 to 705 - 39%
>> Compressing live ranges: from 1449 to 370 - 25%
>> Compressing live ranges: from 3939 to 1093 - 27%
>> Compressing live ranges: from 3939 to 1093 - 27%
>> Compressing live ranges: from 3939 to 1093 - 27%
>> Compressing live ranges: from 3939 to 1093 - 27%
>> Compressing live ranges: from 2464 to 676 - 27%
>> Compressing live ranges: from 1433 to 379 - 26%
>> Compressing live ranges: from 1261 to 348 - 27%
>> Compressing live ranges: from 2835 to 755 - 26%
>> Compressing live ranges: from 5426 to 1678 - 30%
>> Compressing live ranges: from 5227 to 1477 - 28%
>> Compressing live ranges: from 1845 to 467 - 25%
>> Compressing live ranges: from 4868 to 1378 - 28%
>> Compressing live ranges: from 4875 to 1388 - 28%
>> Compressing live ranges: from 4882 to 1384 - 28%
>> Compressing live ranges: from 5688 to 1714 - 30%
>> Compressing live ranges: from 4943 to 1310 - 26%
>> Compressing live ranges: from 2976 to 792 - 26%
>> Compressing live ranges: from 5463 to 1526 - 27%
>> Compressing live ranges: from 2854 to 730 - 25%
>> Compressing live ranges: from 1810 to 745 - 41%
>> Compressing live ranges: from 2771 to 904 - 32%
>> Compressing live ranges: from 4916 to 1429 - 29%
>> Compressing live ranges: from 6505 to 2238 - 34%
>> Compressing live ranges: from 6493 to 166 - 2%
>> Compressing live ranges: from 5498 to 1734 - 31%
>> Compressing live ranges: from 1810 to 745 - 41%
>> Compressing live ranges: from 5043 to 1420 - 28%
>> Compressing live ranges: from 3094 to 788 - 25%
>> Compressing live ranges: from 4563 to 1311 - 28%
>> Compressing live ranges: from 4557 to 158 - 3%
>> Compressing live ranges: from 1050 to 274 - 26%
>> Compressing live ranges: from 1602 to 434 - 27%
>> Compressing live ranges: from 2474 to 600 - 24%
>> Compressing live ranges: from 2718 to 866 - 31%
>> Compressing live ranges: from 2097 to 716 - 34%
>> Compressing live ranges: from 4152 to 1099 - 26%
>> Compressing live ranges: from 5065 to 1514 - 29%
>> Compressing live ranges: from 1236 to 359 - 29%
>> Compressing live ranges: from 1722 to 541 - 31%
>> Compressing live ranges: from 5186 to 1401 - 27%
>>
>> Unfortunately the dump doesn't mention how many live ranges could be
>> merged thanks to the compression.
>>
>> It'd also be good to understand why the compression ratios are so
>> small, and consistently around ~30%.
>
> This sentence is not clear to me.  30% means 3 times less points. It is
> pretty good compression.
>
>>   Maybe curr_point includes things
>> it should ignore (DEBUG_INSNs, NOTEs, ...)?
>>
> After the compression, there are only points important for conflict info,
> i.e. only points where some reg dies or start living.  Even more if on the
> subsequent points there are only life starts or only deaths,  they are
> represented by one point after the compression.


With this patch the amount of compression is reduced. Without the
patch the compression ratio is typically around 30%, with the patch
this goes up to ~65%. The worst compression ratios (where worse is
better in this case :-) are:

$ grep Compressing log4 | egrep " [78].%"
Compressing live ranges: from 31 to 22 - 70%, pre_count 17, post_count 15
Compressing live ranges: from 128 to 94 - 73%, pre_count 87, post_count 77
Compressing live ranges: from 32 to 23 - 71%, pre_count 16, post_count 15
Compressing live ranges: from 38 to 29 - 76%, pre_count 19, post_count 18
Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24
Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24
Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24
Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33
Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33
Compressing live ranges: from 125 to 89 - 71%, pre_count 71, post_count 62
Compressing live ranges: from 10 to 8 - 80%, pre_count 6, post_count 6
Compressing live ranges: from 54 to 39 - 72%, pre_count 35, post_count 30
Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33
Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33
Compressing live ranges: from 194 to 143 - 73%, pre_count 127, post_count 114
Compressing live ranges: from 2804 to 2047 - 73%, pre_count 2932,
post_count 2841
Compressing live ranges: from 194 to 143 - 73%, pre_count 127, post_count 114
Compressing live ranges: from 2184 to 1533 - 70%, pre_count 1458,
post_count 1366
Compressing live ranges: from 20 to 16 - 80%, pre_count 10, post_count 10

(pre_count and post_count are the number of live ranges before/after
compressing)

The "worst" result is this:
Compressing live ranges: from 726174 to 64496 - 8%, pre_count
40476128, post_count 12483414

But that's still a lot better than before the patch for the same function:
Compressing live ranges: from 1742569 to 73069 - 4%, pre_count
40842330, post_count 12479992

I'm not sure why I end up with fewer program points overall, I had
expected that remove_some_program_points_and_update_live_ranges would
make the post-compression numbers the same for the unpatched and
patched compiler. But oh well... There is no difference for >90% of
the cases so probably I just happen to trigger a few extra merges that
remove_some_program_points_and_update_live_ranges before the patch
somehow overlooked and kept disjoint.

Unfortunately, and much to my chagrin, there is almost no pay-off on
compile time. This is not completely unexpected, because the most
costly loops in e.g. remove_some_program_points_and_update_live_ranges
are unchanged (loop over all live ranges for all pseudos). Timings
before ("log") and after ("log4") the patch:

$ grep "LRA " log{,4} | egrep -o "^.*usr"
log: LRA non-specific        :  60.94 ( 5%) usr
log: LRA virtuals elimination:  60.24 ( 5%) usr
log: LRA reload inheritance  :   6.25 ( 1%) usr
log: LRA create live ranges  : 181.79 (16%) usr
log: LRA hard reg assignment : 132.33 (11%) usr
log: LRA coalesce pseudo regs:   2.48 ( 0%) usr
log4: LRA non-specific        :  60.52 ( 5%) usr
log4: LRA virtuals elimination:  62.44 ( 5%) usr
log4: LRA reload inheritance  :   6.45 ( 1%) usr
log4: LRA create live ranges  : 177.69 (15%) usr
log4: LRA hard reg assignment : 131.76 (11%) usr
log4: LRA coalesce pseudo regs:   2.60 ( 0%) usr

I was trying to make the
remove_some_program_points_and_update_live_ranges compression step
almost unnecessary (at least for the first iteration of
lra_create_live_ranges) because that would give a ~25% speedup for
"LRA create live ranges" for the PR54146 test case. But even with this
patch there are no functions for which
remove_some_program_points_and_update_live_ranges becomes a no-op :-(

Still, I would like to propose this for the lra-branch. Hopefully it's
possible to find out how to get closer to 100%, there is a potential
win there.

Bootstrapped and tested on x86_64-unknown-linux-gnu.
I also checked and double-checked that there are no code generation
differences on all pre-processed source files of cc1.

OK for lra-branch?

Ciao!
Steven

lra-lives-cheaper-recompute.diff
Description: Binary data

[patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])

Reply via email to