Re: RFC: LRA for x86/x86-64 [0/9]

Vladimir Makarov Sun, 30 Sep 2012 17:13:33 -0700

On 12-09-30 7:15 PM, Steven Bosscher wrote:

On Mon, Oct 1, 2012 at 12:50 AM, Vladimir Makarov <vmaka...@redhat.com> wrote:

   As I wrote, I don't see that LRA has a problem right now because even on
8GB machine, GCC with LRA is 10% faster than GCC with reload with real time
point of view (not saying that LRA generates 15% smaller code).  And real
time is what really matters for users.

For me, those compile times I reported *are* real times.

Sorry, I missed your data (it was buried in calculations of percentsfrom my data). I saw that on my machine maxrss was 8GB with a lot ofpage faults and small cpu utillizations (about 30%). I guess you used16GB machine and 16GB is enough for this test. Ok, I'll work on thisproblem although I think it will take some time to solve it or make itmore tolerable. Although I think it is not right to pay attention onlyto compilation time. See my reasons below.

But you are right that the test case is a bit extreme. Before GCC 4.8
other parts of the compiler also choked on it. Still, the test case
comes from real user's code (combination of Eigen library with MPFR),
and it shows scalability problems in LRA (and IRA) that one can't just
"explain away" with an "RA is just expensive" claim. The test case for
PR26854 is Brad Lucier's Scheme interpreter, that is also real user's
code.

I myself wrote a few interpreters, so I looked at the code of Schemeinterpreter.

It seems to me it is a computer generated code. So the firstsolution would be generate a few functions instead of one. Generating ahuge function is not wise for performance critical applications becausecompilers for this corner cases use simpler faster algorithms foroptimization generating worse code. By the way, I can solve thecompilation time problem by using simpler algorithms harmingperformance. The author will be happy with compilation speed but willbe disappointed by saying 10% slower interpreter. I don't think it is asolution the problem, it is creating a bigger problem. It seems to me Ihave to do this :) Or if I tell him that waiting 40% more time he canget 15% smaller code, I guess he would prefer this. Of course it isinteresting problem to speed up the compiler but we don't look at wholepicture when we solve compilation time by hurting the performance.

Scalability problem is a problem of computer generated programs andusually there is simpler and better solution for this by generatingsmaller functions.

By the way, I also found that the author uses label values. It isnot the best solution although there are still a lot of articlesrecommending it. One switch is faster for modern computers. Anton Ertlproposed to use several switches (one switch after each interpreterinsn) for better branch predictions but I found this work worse than oneswitch solution at least for my interpreters.

Re: RFC: LRA for x86/x86-64 [0/9]

Reply via email to