On Sat, 29 Oct 2011, Maxim Kuvyrkov wrote:
I like this variant a lot better than the last one - still it lacks any
analysis-based justification for iteration (see my reply to Matt on
what I discussed with Honza).
Yes, having a way to tell whether a function have significantly changed
would be awesome. My approach here would be to make inline_parameters
output feedback of how much the size/time metrics have changed for a
function since previous run. If the change is above X%, then queue
functions callers for more optimizations. Similarly, Martin's
rebuild_cgraph_edges_and_devirt (when that goes into trunk) could queue
new direct callees and current function for another iteration if new
direct edges were resolved.
Figuring out the heuristic will need decent testing on a few projects to
figure out what the "sweet spot" is (smallest binary for time/passes
spent) for that given codebase. With a few data points, a reasonable stab
at the metrics you mention can be had that would not terminate the
iterations before the known optimial number of passes. Without those data
points, it seems like making sure the metrics allow those "sweet spots" to
be attained will be difficult.
Thus, I don't think we want to
merge this in its current form or in this stage1.
What is the benefit of pushing this to a later release? If anything,
merging the support for iterative optimizations now will allow us to
consider adding the wonderful smartness to it later. In the meantime,
substituting that smartness with a knob is still a great alternative.
I agree (of course). Having the knob will be very useful for testing and
determining the acceptance criteria for the later "smartness". While
terminating early would be a nice optimization, the feature is still
intrinsically useful and deployable without it. In addition, when using
LTO on nearly all the projects/modules I tested on, 3+ passes were
always productive. To be fair, when not using LTO, beyond 2-3 passes did
not often produce improvements unless individual compilation units were
enormous.
There was also the question of if some of the improvements seen with
multiple passes were indicative of deficiencies in early inlining, CFG,
SRA, etc. If the knob is available, I'm happy to continue testing on the
same projects I've filed recent LTO/graphite bugs against (glib, zlib,
openssl, scummvm, binutils, etc) and write a report on what I observe as
"suspicious" improvements that perhaps should be caught/made in a single
pass.
It's worth noting again that while this is a useful feature in and of
itself (especially when combined with LTO), it's *extremely* useful when
coupled with the de-virtualization improvements submitted in other
threads. The examples submitted for inclusion in the test suite aren't
academic -- they are reductions of real-world performance issues from a
mature (and shipping) C++-based networking product. Any C++ codebase that
employs physical separation in their designs via Factory patterns,
Interface Segregation, and/or Dependency Inversion will likely see
improvements. To me, these enahncements combine to form one of the biggest
leaps I've seen in C++ code optimization -- code that can be clean, OO,
*and* fast.
Richard: If there's any additional testing or information I can reasonably
provide to help get this in for this stage1, let me know.
Thanks!
--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt