On Wed, Oct 3, 2012 at 4:56 PM, Vladimir Makarov <vmaka...@redhat.com> wrote: > On 12-10-03 3:13 AM, Steven Bosscher wrote: >> >> On Tue, Oct 2, 2012 at 3:42 PM, Richard Sandiford >> <rdsandif...@googlemail.com> wrote: >>>> >>>> +/* Compress pseudo live ranges by removing program points where >>>> + nothing happens. Complexity of many algorithms in LRA is linear >>>> + function of program points number. To speed up the code we try to >>>> + minimize the number of the program points here. */ >>>> +static void >>>> +remove_some_program_points_and_update_live_ranges (void) >>> >>> Genuine question, but could we do this on the fly instead, >>> by not incrementing curr_point if the current point had no value? >>> >>> I suppose the main complication would be checking cases where >>> all births are recorded by extending the previous just-closed live >>> range rather than starting a new one, in which case it's the previous >>> point that needs to be reused. Hmm... >> >> It does seem to be something worth investigating further. Things like: >> >> Compressing live ranges: from 1742579 to 554532 - 31% >> Compressing live ranges: from 1742569 to 73069 - 4% >> >> look extreme, but they're actually the norm. For the same test case >> (PR54146 still) but looking only at functions with initially 1000-9999 >> program points, you get this picture: >> >> Compressing live ranges: from 1766 to 705 - 39% >> Compressing live ranges: from 1449 to 370 - 25% >> Compressing live ranges: from 3939 to 1093 - 27% >> Compressing live ranges: from 3939 to 1093 - 27% >> Compressing live ranges: from 3939 to 1093 - 27% >> Compressing live ranges: from 3939 to 1093 - 27% >> Compressing live ranges: from 2464 to 676 - 27% >> Compressing live ranges: from 1433 to 379 - 26% >> Compressing live ranges: from 1261 to 348 - 27% >> Compressing live ranges: from 2835 to 755 - 26% >> Compressing live ranges: from 5426 to 1678 - 30% >> Compressing live ranges: from 5227 to 1477 - 28% >> Compressing live ranges: from 1845 to 467 - 25% >> Compressing live ranges: from 4868 to 1378 - 28% >> Compressing live ranges: from 4875 to 1388 - 28% >> Compressing live ranges: from 4882 to 1384 - 28% >> Compressing live ranges: from 5688 to 1714 - 30% >> Compressing live ranges: from 4943 to 1310 - 26% >> Compressing live ranges: from 2976 to 792 - 26% >> Compressing live ranges: from 5463 to 1526 - 27% >> Compressing live ranges: from 2854 to 730 - 25% >> Compressing live ranges: from 1810 to 745 - 41% >> Compressing live ranges: from 2771 to 904 - 32% >> Compressing live ranges: from 4916 to 1429 - 29% >> Compressing live ranges: from 6505 to 2238 - 34% >> Compressing live ranges: from 6493 to 166 - 2% >> Compressing live ranges: from 5498 to 1734 - 31% >> Compressing live ranges: from 1810 to 745 - 41% >> Compressing live ranges: from 5043 to 1420 - 28% >> Compressing live ranges: from 3094 to 788 - 25% >> Compressing live ranges: from 4563 to 1311 - 28% >> Compressing live ranges: from 4557 to 158 - 3% >> Compressing live ranges: from 1050 to 274 - 26% >> Compressing live ranges: from 1602 to 434 - 27% >> Compressing live ranges: from 2474 to 600 - 24% >> Compressing live ranges: from 2718 to 866 - 31% >> Compressing live ranges: from 2097 to 716 - 34% >> Compressing live ranges: from 4152 to 1099 - 26% >> Compressing live ranges: from 5065 to 1514 - 29% >> Compressing live ranges: from 1236 to 359 - 29% >> Compressing live ranges: from 1722 to 541 - 31% >> Compressing live ranges: from 5186 to 1401 - 27% >> >> Unfortunately the dump doesn't mention how many live ranges could be >> merged thanks to the compression. >> >> It'd also be good to understand why the compression ratios are so >> small, and consistently around ~30%. > > This sentence is not clear to me. 30% means 3 times less points. It is > pretty good compression. > >> Maybe curr_point includes things >> it should ignore (DEBUG_INSNs, NOTEs, ...)? >> > After the compression, there are only points important for conflict info, > i.e. only points where some reg dies or start living. Even more if on the > subsequent points there are only life starts or only deaths, they are > represented by one point after the compression.
With this patch the amount of compression is reduced. Without the patch the compression ratio is typically around 30%, with the patch this goes up to ~65%. The worst compression ratios (where worse is better in this case :-) are: $ grep Compressing log4 | egrep " [78].%" Compressing live ranges: from 31 to 22 - 70%, pre_count 17, post_count 15 Compressing live ranges: from 128 to 94 - 73%, pre_count 87, post_count 77 Compressing live ranges: from 32 to 23 - 71%, pre_count 16, post_count 15 Compressing live ranges: from 38 to 29 - 76%, pre_count 19, post_count 18 Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24 Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24 Compressing live ranges: from 40 to 28 - 70%, pre_count 26, post_count 24 Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33 Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33 Compressing live ranges: from 125 to 89 - 71%, pre_count 71, post_count 62 Compressing live ranges: from 10 to 8 - 80%, pre_count 6, post_count 6 Compressing live ranges: from 54 to 39 - 72%, pre_count 35, post_count 30 Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33 Compressing live ranges: from 60 to 43 - 71%, pre_count 37, post_count 33 Compressing live ranges: from 194 to 143 - 73%, pre_count 127, post_count 114 Compressing live ranges: from 2804 to 2047 - 73%, pre_count 2932, post_count 2841 Compressing live ranges: from 194 to 143 - 73%, pre_count 127, post_count 114 Compressing live ranges: from 2184 to 1533 - 70%, pre_count 1458, post_count 1366 Compressing live ranges: from 20 to 16 - 80%, pre_count 10, post_count 10 (pre_count and post_count are the number of live ranges before/after compressing) The "worst" result is this: Compressing live ranges: from 726174 to 64496 - 8%, pre_count 40476128, post_count 12483414 But that's still a lot better than before the patch for the same function: Compressing live ranges: from 1742569 to 73069 - 4%, pre_count 40842330, post_count 12479992 I'm not sure why I end up with fewer program points overall, I had expected that remove_some_program_points_and_update_live_ranges would make the post-compression numbers the same for the unpatched and patched compiler. But oh well... There is no difference for >90% of the cases so probably I just happen to trigger a few extra merges that remove_some_program_points_and_update_live_ranges before the patch somehow overlooked and kept disjoint. Unfortunately, and much to my chagrin, there is almost no pay-off on compile time. This is not completely unexpected, because the most costly loops in e.g. remove_some_program_points_and_update_live_ranges are unchanged (loop over all live ranges for all pseudos). Timings before ("log") and after ("log4") the patch: $ grep "LRA " log{,4} | egrep -o "^.*usr" log: LRA non-specific : 60.94 ( 5%) usr log: LRA virtuals elimination: 60.24 ( 5%) usr log: LRA reload inheritance : 6.25 ( 1%) usr log: LRA create live ranges : 181.79 (16%) usr log: LRA hard reg assignment : 132.33 (11%) usr log: LRA coalesce pseudo regs: 2.48 ( 0%) usr log4: LRA non-specific : 60.52 ( 5%) usr log4: LRA virtuals elimination: 62.44 ( 5%) usr log4: LRA reload inheritance : 6.45 ( 1%) usr log4: LRA create live ranges : 177.69 (15%) usr log4: LRA hard reg assignment : 131.76 (11%) usr log4: LRA coalesce pseudo regs: 2.60 ( 0%) usr I was trying to make the remove_some_program_points_and_update_live_ranges compression step almost unnecessary (at least for the first iteration of lra_create_live_ranges) because that would give a ~25% speedup for "LRA create live ranges" for the PR54146 test case. But even with this patch there are no functions for which remove_some_program_points_and_update_live_ranges becomes a no-op :-( Still, I would like to propose this for the lra-branch. Hopefully it's possible to find out how to get closer to 100%, there is a potential win there. Bootstrapped and tested on x86_64-unknown-linux-gnu. I also checked and double-checked that there are no code generation differences on all pre-processed source files of cc1. OK for lra-branch? Ciao! Steven
lra-lives-cheaper-recompute.diff
Description: Binary data