Re: Massive performance regression from switching to gcc 4.5

Jan Hubicka Fri, 25 Jun 2010 06:11:12 -0700

> On Fri, Jun 25, 2010 at 8:15 AM, Jonathan Adamczewski
> <jadam...@utas.edu.au> wrote:
> > On 25/06/10 06:39, Richard Guenther wrote:
> >> There are btw. some bugs wrt accounting of functions called once
> >> being inlined in 4.5 which were fixed on trunk which allow extra
> >> inlining.
> >>
> >
> > Are these changes likely to make it onto the 4.5 branch and into (say)
> > 4.5.1?
> 
> Well, I'm always a bit nervous when backporting inline heuristic
> changes as that may trigger latent problems on code where they
> weren't seen before.
> 
> We are talking about revs 158278 and 159931.  And at this point
> I'd leave it to Honza to consider their safety and do and test a
> backport.


Main change in GCC 4.5 heuristic is that it is no longer driven by somewhat
fuzzy estimates of costs that are mixture of size, speed and some legacy (such
as bug completely ignoring existence of loads and stores).  It now uses code
size estimate and speedup to drive inlining (that is basically greedy algorithm
trying to maximize speedup at the code size growth constrains).

When you compile with -Os, the inlining happens only when code size reduces.
Thus we pretty much care about the code size metrics only.  I suspect the
problem here might be that normal C++ code needs some inlining to make
abstraction penalty go away. GCC -Os implementation is generally tuned for
CSiBE and it is somewhat C centric (that makes sense for embedded world). As a
result we might get quite noticeable slowdowns on C++ apps compiled with -Os
(and code size growth too since abstraction is never eliminated). It can be
seen also at tramp3d (Pooma testcase) where -Os produces a lot bigger and a lot
slower code.

I would be very interested to know the most obvious cases where we miss
inlining and should not.  It would be most helpful to directly know
-fdump-tree-inline_param-details for those or have self contained testcase.

It might be for benefit of both projects if we managed to set up regular
mozilla benchmarking. (Simlar as we do for C++ benchmarks at
http://gcc.opensuse.org/c++bench-frescobaldi/ ) I was thinking about this up
for a while but was somewhat discougrated by the overall complexity of Mozilla
and also currently we lack hardware for all the testing we would like to do.
Mozilla is wonderful example of complex real world C++ APP with a benchmark
suite, so it makes it really good target for tunning IPA.

I would be also very interested to know how profile feedback works in this case
(and why it does not work in previous releases).  I am maintaining both areas
of compiler and would be definitly happy to do some work to help to make it
useful for you.

GCC 4.6 has several changes in inlining heruistics that might be considered
for backporting if they are found to be _really_ important. Most noticeable
are probably:

  1) It fixes miscounting of variadic functios (this had quite large effect
     on GCC itself since it prevents inlining parts of fatal_error)
  2) It fixes accounting of static functions (previously the overall unit
     change was decreased twice for every offline copy eliminated, that 
     accidentally imroved codegen for some C++ testcases but caused code
     size growth eslewhere)
  3) Priority queue was fixed, so it is now accoutning correctly cost changes
     after inlining (this caused best improvements in C)
  4) There was speedups in inlining heruristics when delaing with functions
     having realy many (say over 50000) callers.

2) and 3) needs to go together or we get slowdonws on our current C++ suite.

I am however concerned that the problem might be clash in between -Os
and the fact that C++ code generally needs speculative code growing inlining
to get rid of abstraction.  It depends what your abstraction is to see
if we can get somehow easilly around this problem. GCC can detect certain
form of constructs that will go away after inlining and I was also thining
about adding small code growth buffer for -Os inlining too if it helps
at average.

Honza
> 
> Richard.
> 
> > j.
> >

Re: Massive performance regression from switching to gcc 4.5

Reply via email to