On 10/04/2012 01:44 PM, Vladimir Makarov wrote:
On 10/04/2012 12:56 PM, Steven Bosscher wrote:
On Thu, Oct 4, 2012 at 6:12 PM, Vladimir Makarov
<vmaka...@redhat.com> wrote:
so that I get the timings in the -ftime-report like so:
CPROP : 43.14 ( 4%) usr
integrated RA : 200.81 (17%) usr
LRA non-specific : 62.18 ( 5%) usr
LRA virtuals elimination: 61.71 ( 5%) usr
LRA reload inheritance : 6.41 ( 1%) usr
LRA create live ranges : 139.75 (13%) usr
LRA hard reg assignment : 130.90 (11%) usr
LRA coalesce pseudo regs: 2.45 ( 0%) usr
reload : 9.09 ( 1%) usr
"Crude, but efficient" (tm) :-)
How do you measure the time spent in that function, and in
remove_some_program_points_and_update_live_ranges?
You use AMD and I use Intel. So it may be different with cache point
of view.
Another thing is that I used gprof (-pg was used for bitmap.o lra*.o
and ira*.o). Your measurements are more accurate, I think, because it
is without instrumentation and bitmap.o takes too much time. Bitmap
does not work well in this case because they are too big and sparse.
Yes, gcc17 (AMD) behaviour is very different from Intel machines. I
think that is why we have so different numbers. Only
create_start_and_finish_chains takes 2.4% (28sec) according to gprof on
slow.cc (before my last patch). Also on AMD machine find_hard_regno_for
is on the first place (on Intel machines, several bitmap functions are
on the 1st place). That is the function I wanted to look at more later
(to implement some simpler algorithm for huge functions).
I think, the importance of your patch will be even more important as my
last patch increases % spent in lra-lives.c. So thank you very much,
Steven.
I'd like to play more with your patch and I'll give you an approval to
commit probably tomorrow.