Re: RFC: LRA for x86/x86-64 [0/9]

Vladimir Makarov Sun, 30 Sep 2012 15:51:02 -0700

On 12-09-30 1:03 PM, Richard Guenther wrote:

On Sun, Sep 30, 2012 at 6:52 PM, Steven Bosscher <stevenb....@gmail.com> wrote:

Hi,



To look at it in yet another way:

  integrated RA           : 189.34 (16%) usr
  LRA non-specific        :  59.82 ( 5%) usr
  LRA virtuals eliminatenon:  56.79 ( 5%) usr
  LRA create live ranges  : 175.30 (15%) usr
  LRA hard reg assignment : 130.85 (11%) usr

The IRA pass is slower than the next-slowest pass (tree PRA) by almost
a factor 2.5.  Each of the individually-measured *phases* of LRA is
slower than the complete IRA *pass*. These 5 timevars together make up
for 52% of all compile time.

That figure indeed makes IRA + LRA look bad.  Did you by chance identify
anything obvious that can be done to improve the situation?

As I wrote, I don't see that LRA has a problem right now because evenon 8GB machine, GCC with LRA is 10% faster than GCC with reload withreal time point of view (not saying that LRA generates 15% smallercode). And real time is what really matters for users.

But I think that LRA cpu time problem for this test can be fixed. ButI don't think I can fix it for 2 weeks. So if people believe thatcurrent LRA behaviour on this PR is a stopper to include it into gcc4.8than we should postpone its inclusion until gcc4.9 when I hope to fix it.

As for IRA, IRA uses Chaitin-Briggs algorithm which scales worse thanmost other optimizations. So the bigger test, the bigger percent of IRAin compilation time. I don't believe that somebdoy can achieve a bettercode using other faster RA algorithms. LLVM has no such problem becauseeven their new RA (a big improvement for llvm3.0) is not based on CBalgorithm. It is still based on modification of linear-scan RA. Itwould be interesting to check how other compilers behave on this test.Particually Intel one is most interesting (but I have doubts that itwill be doing well because I saw programs when Intel compiler withoptimizations struggled more than 40 mins on a file compilation).

Still we can improve IRA behaviour from simple solutions like using afast algorithm (currently used for -O0) for huge functions or byimplementing division of function on smaller regions (but it is hard toimplement and it will not work well for tests when most pseudos havevery long live range). I will work on it when I am less busy.

About 14 years ago in Cygnus time, I worked on some problem from acustomer. GCC was not able to compile a big program by that time.Fixing GCC would have required a lot of efforts. Finally, the customermodifies its code to generate smaller functions and problem was gone. Imean that we could spend a lot of efforts to fix corner cases, ignoringimprovements for majority of users. But it seems to me that I'll haveto work on these PRs.

Re: RFC: LRA for x86/x86-64 [0/9]

Reply via email to