Hi Vladimir, Firstly, thanks for looking into this.
> Analysis of 187.facerec problem was actually easier than applu one. > It has one very hot (80%) function localmove::graphRoutines.f90 and > there is only one hot loop in the function. Although the loop is > pretty big because of inlining TopCostFct. The loop contains a few > if-statements and several switch-statements. But only part of the loop > body is hot. I've been chasing the specific portion of code that performs badly and i've come to the same conclusion. I could isolate some parts of the loop that have a greater effect on the performance, but it's still a big loop. I was going to open a PR for this. Might be a good idea to keep track of the progress, and we can attach the testcases there. > After comparisons of the hot loop parts I found that IRA generates > about 20 insns more which are some stores but mostly load. I did not > find any problem with spilling in reload (that is after fixing one > spilling problem about which I wrote in my previous email). Reload > spills only two registers which lives throughout the hot parts. I've seen quite a number of loads/stores using the same one or two registers over and over again. For example, r12 was heavily used during that hot loop, according to what i saw. > So I concluded that the problem was actually in IRA allocation. I > did not find any wrong in IRA implementation of coloring algorithm. > Changing spill first heuristic there (we choose allocno with smaller > number to spill) from > > SPILL_COST / (NUMBER_OF_LEFT_CONFLICTS * NUMBER_HARD_REGISTER_NEEDED > + 1) > > to > > SPILL_COST * log (NREFS) / (NUMBER_OF_LEFT_CONFLICTS > * NUMBER_HARD_REGISTER_NEEDED > * LIVE_RANGE_LENGTH + 1) > > increased facerec score on Power6 from 1760 to 2039 (vs 1850 for the > old RA). It looks very good but unfortunately the overall SPECFP2000 > score was not changed much and SPECINT2000 was not change at all. > Some programs score became better the rest became worse. > > This is classical example of heuristic approach drawback. You never > get best code using one heuristic for all programs. Sometimes the old > RA will generate better code. What are goal should be is to achieve > better code in "average", i.e. for some credible benchmark like > SPECFP2000 and we can achieve this with IRA. > > It is hard to say what heuristic will work best for given program > and make the choice automatically (especially without profiling > because branch probability prediction algorithm is not that accurate). > Although we could add an option to choose different spill heuristics > in hope to achieve best overall score using machine learning algorithm > like Google or Netflix use to find the better prediction > (e.g. neighborhood based search). There is one promising project for > GCC using this approach. It was reported by Grigory Fursin on this > year GCC Summit. I think it is promising because different spill > heuristics results in very different times (about 20% for facerec). > > Probably I'll submit the new heuristic because SPECFP2000 score a > bit better (about 0.5%) with using this. But I need some time to > check it on other platforms. If you have any patches that you think are worth testing, let me know so i can give them a go. I'll be following closely the progress on this topic. Thanks, Luis