On 12-09-30 7:15 PM, Steven Bosscher wrote:
On Mon, Oct 1, 2012 at 12:50 AM, Vladimir Makarov <vmaka...@redhat.com> wrote:
   As I wrote, I don't see that LRA has a problem right now because even on
8GB machine, GCC with LRA is 10% faster than GCC with reload with real time
point of view (not saying that LRA generates 15% smaller code).  And real
time is what really matters for users.
For me, those compile times I reported *are* real times.
Sorry, I missed your data (it was buried in calculations of percents from my data). I saw that on my machine maxrss was 8GB with a lot of page faults and small cpu utillizations (about 30%). I guess you used 16GB machine and 16GB is enough for this test. Ok, I'll work on this problem although I think it will take some time to solve it or make it more tolerable. Although I think it is not right to pay attention only to compilation time. See my reasons below.
But you are right that the test case is a bit extreme. Before GCC 4.8
other parts of the compiler also choked on it. Still, the test case
comes from real user's code (combination of Eigen library with MPFR),
and it shows scalability problems in LRA (and IRA) that one can't just
"explain away" with an "RA is just expensive" claim. The test case for
PR26854 is Brad Lucier's Scheme interpreter, that is also real user's
code.


I myself wrote a few interpreters, so I looked at the code of Scheme interpreter.

It seems to me it is a computer generated code. So the first solution would be generate a few functions instead of one. Generating a huge function is not wise for performance critical applications because compilers for this corner cases use simpler faster algorithms for optimization generating worse code. By the way, I can solve the compilation time problem by using simpler algorithms harming performance. The author will be happy with compilation speed but will be disappointed by saying 10% slower interpreter. I don't think it is a solution the problem, it is creating a bigger problem. It seems to me I have to do this :) Or if I tell him that waiting 40% more time he can get 15% smaller code, I guess he would prefer this. Of course it is interesting problem to speed up the compiler but we don't look at whole picture when we solve compilation time by hurting the performance.

Scalability problem is a problem of computer generated programs and usually there is simpler and better solution for this by generating smaller functions.

By the way, I also found that the author uses label values. It is not the best solution although there are still a lot of articles recommending it. One switch is faster for modern computers. Anton Ertl proposed to use several switches (one switch after each interpreter insn) for better branch predictions but I found this work worse than one switch solution at least for my interpreters.


Reply via email to