On 12-09-30 7:15 PM, Steven Bosscher wrote:
On Mon, Oct 1, 2012 at 12:50 AM, Vladimir Makarov <vmaka...@redhat.com> wrote:
As I wrote, I don't see that LRA has a problem right now because even on
8GB machine, GCC with LRA is 10% faster than GCC with reload with real time
point of view (not saying that LRA generates 15% smaller code). And real
time is what really matters for users.
For me, those compile times I reported *are* real times.
Sorry, I missed your data (it was buried in calculations of percents
from my data). I saw that on my machine maxrss was 8GB with a lot of
page faults and small cpu utillizations (about 30%). I guess you used
16GB machine and 16GB is enough for this test. Ok, I'll work on this
problem although I think it will take some time to solve it or make it
more tolerable. Although I think it is not right to pay attention only
to compilation time. See my reasons below.
But you are right that the test case is a bit extreme. Before GCC 4.8
other parts of the compiler also choked on it. Still, the test case
comes from real user's code (combination of Eigen library with MPFR),
and it shows scalability problems in LRA (and IRA) that one can't just
"explain away" with an "RA is just expensive" claim. The test case for
PR26854 is Brad Lucier's Scheme interpreter, that is also real user's
code.
I myself wrote a few interpreters, so I looked at the code of Scheme
interpreter.
It seems to me it is a computer generated code. So the first
solution would be generate a few functions instead of one. Generating a
huge function is not wise for performance critical applications because
compilers for this corner cases use simpler faster algorithms for
optimization generating worse code. By the way, I can solve the
compilation time problem by using simpler algorithms harming
performance. The author will be happy with compilation speed but will
be disappointed by saying 10% slower interpreter. I don't think it is a
solution the problem, it is creating a bigger problem. It seems to me I
have to do this :) Or if I tell him that waiting 40% more time he can
get 15% smaller code, I guess he would prefer this. Of course it is
interesting problem to speed up the compiler but we don't look at whole
picture when we solve compilation time by hurting the performance.
Scalability problem is a problem of computer generated programs and
usually there is simpler and better solution for this by generating
smaller functions.
By the way, I also found that the author uses label values. It is
not the best solution although there are still a lot of articles
recommending it. One switch is faster for modern computers. Anton Ertl
proposed to use several switches (one switch after each interpreter
insn) for better branch predictions but I found this work worse than one
switch solution at least for my interpreters.