On 12-09-30 1:03 PM, Richard Guenther wrote:
On Sun, Sep 30, 2012 at 6:52 PM, Steven Bosscher <stevenb....@gmail.com> wrote:
Hi,


To look at it in yet another way:

  integrated RA           : 189.34 (16%) usr
  LRA non-specific        :  59.82 ( 5%) usr
  LRA virtuals eliminatenon:  56.79 ( 5%) usr
  LRA create live ranges  : 175.30 (15%) usr
  LRA hard reg assignment : 130.85 (11%) usr
The IRA pass is slower than the next-slowest pass (tree PRA) by almost
a factor 2.5.  Each of the individually-measured *phases* of LRA is
slower than the complete IRA *pass*. These 5 timevars together make up
for 52% of all compile time.
That figure indeed makes IRA + LRA look bad.  Did you by chance identify
anything obvious that can be done to improve the situation?


As I wrote, I don't see that LRA has a problem right now because even on 8GB machine, GCC with LRA is 10% faster than GCC with reload with real time point of view (not saying that LRA generates 15% smaller code). And real time is what really matters for users.

But I think that LRA cpu time problem for this test can be fixed. But I don't think I can fix it for 2 weeks. So if people believe that current LRA behaviour on this PR is a stopper to include it into gcc4.8 than we should postpone its inclusion until gcc4.9 when I hope to fix it.

As for IRA, IRA uses Chaitin-Briggs algorithm which scales worse than most other optimizations. So the bigger test, the bigger percent of IRA in compilation time. I don't believe that somebdoy can achieve a better code using other faster RA algorithms. LLVM has no such problem because even their new RA (a big improvement for llvm3.0) is not based on CB algorithm. It is still based on modification of linear-scan RA. It would be interesting to check how other compilers behave on this test. Particually Intel one is most interesting (but I have doubts that it will be doing well because I saw programs when Intel compiler with optimizations struggled more than 40 mins on a file compilation).

Still we can improve IRA behaviour from simple solutions like using a fast algorithm (currently used for -O0) for huge functions or by implementing division of function on smaller regions (but it is hard to implement and it will not work well for tests when most pseudos have very long live range). I will work on it when I am less busy.

About 14 years ago in Cygnus time, I worked on some problem from a customer. GCC was not able to compile a big program by that time. Fixing GCC would have required a lot of efforts. Finally, the customer modifies its code to generate smaller functions and problem was gone. I mean that we could spend a lot of efforts to fix corner cases, ignoring improvements for majority of users. But it seems to me that I'll have to work on these PRs.

Reply via email to