On Wed, Oct 12, 2011 at 8:50 AM, Maxim Kuvyrkov <ma...@codesourcery.com> wrote: > The following patch adds new knob to make GCC perform several iterations of > early optimizations and inlining. > > This is for dont-care-about-compile-time-optimize-all-you-can scenarios. > Performing several iterations of optimizations does significantly improve > code speed on a certain proprietary source base. Some hand-tuning of the > parameter value is required to get optimum performance. Another good use for > this option is for search and ad-hoc analysis of cases where GCC misses > optimization opportunities. > > With the default setting of '1', nothing is changed from the current status > quo. > > The patch was bootstrapped and regtested with 3 iterations set by default on > i686-linux-gnu. The only failures in regression testsuite were due to latent > bugs in handling of EH information, which are being discussed in a different > thread. > > Performance impact on the standard benchmarks is not conclusive, there are > improvements in SPEC2000 of up to 4% and regressions down to -2%, see [*]. > SPEC2006 benchmarks will take another day or two to complete and I will > update the spreadsheet then. The benchmarks were run on a Core2 system for > all combinations of {-m32/-m64}{-O2/-O3}. > > Effect on compilation time is fairly predictable, about 10% compile time > increase with 3 iterations. > > OK for trunk?
I don't think this is a good idea, especially in the form you implemented it. If we'd want to iterate early optimizations we'd want to do it by iterating an IPA pass so that we benefit from more precise size estimates when trying to inline a function the second time. Also statically scheduling the passes will mess up dump files and you have no chance of say, noticing that nothing changed for function f and its callees in iteration N and thus you can skip processing them in iteration N + 1. So, at least you should split the pass_early_local_passes IPA pass into three, you'd iterate over the 2nd (definitely not over pass_split_functions though), the third would be pass_profile and pass_split_functions only. And you'd iterate from the place the 2nd IPA pass is executed, not by scheduling them N times. Then you'd have to analyze the compile-time impact of the IPA splitting on its own when not iterating. Then you should look at what actually was the optimizations that were performed that lead to the improvement (I can see some indirect inlining happening, but everything else would be a bug in present optimizers in the early pipeline - they are all designed to be roughly independent on each other and _not_ expose new opportunities by iteration). Thus - testcases? Thanks, Richard. > [*] > https://docs.google.com/spreadsheet/ccc?key=0AvK0Y-Pgj7bNdFBQMEJ6d3laeFdvdk9lQ1p0LUFkVFE&hl=en_US > > Thank you, > > -- > Maxim Kuvyrkov > CodeSourcery / Mentor Graphics > > >