On 10/04/2012 01:44 PM, Vladimir Makarov wrote:
On 10/04/2012 12:56 PM, Steven Bosscher wrote:
On Thu, Oct 4, 2012 at 6:12 PM, Vladimir Makarov <vmaka...@redhat.com> wrote:

so that I get the timings in the -ftime-report like so:

  CPROP                   :  43.14 ( 4%) usr
  integrated RA           : 200.81 (17%) usr
  LRA non-specific        :  62.18 ( 5%) usr
  LRA virtuals elimination:  61.71 ( 5%) usr
  LRA reload inheritance  :   6.41 ( 1%) usr
  LRA create live ranges  :  139.75 (13%) usr
  LRA hard reg assignment : 130.90 (11%) usr
  LRA coalesce pseudo regs:   2.45 ( 0%) usr
  reload                  :   9.09 ( 1%) usr

"Crude, but efficient" (tm) :-)

How do you measure the time spent in that function, and in
remove_some_program_points_and_update_live_ranges?

You use AMD and I use Intel. So it may be different with cache point of view.

Another thing is that I used gprof (-pg was used for bitmap.o lra*.o and ira*.o). Your measurements are more accurate, I think, because it is without instrumentation and bitmap.o takes too much time. Bitmap does not work well in this case because they are too big and sparse.

Yes, gcc17 (AMD) behaviour is very different from Intel machines. I think that is why we have so different numbers. Only create_start_and_finish_chains takes 2.4% (28sec) according to gprof on slow.cc (before my last patch). Also on AMD machine find_hard_regno_for is on the first place (on Intel machines, several bitmap functions are on the 1st place). That is the function I wanted to look at more later (to implement some simpler algorithm for huge functions).

I think, the importance of your patch will be even more important as my last patch increases % spent in lra-lives.c. So thank you very much, Steven.

I'd like to play more with your patch and I'll give you an approval to commit probably tomorrow.

Reply via email to