Re: Remove overall growth from badness metrics

Jan Hubicka Wed, 09 Jan 2019 06:32:43 -0800

> Looks like that our current documentation has a bug in the below:
> 
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html 
> <https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html>
> 
> -finline-functions
> Consider all functions for inlining, even if they are not declared inline. 
> The compiler heuristically decides which functions are worth integrating in 
> this way.
> If all calls to a given function are integrated, and the function is declared 
> static, then the function is normally not output as assembler code in its own 
> right.
> Enabled at levels -O2, -O3, -Os. Also enabled by -fprofile-use and 
> -fauto-profile.
> 
> It clearly mentioned that -finline-functions is enabled at -O2, O3, -Os. 
> 
> And I checked the gcc9 source code, opts.c:
> 
>     /* -O3 and -Os optimizations.  */
>     /* Inlining of functions reducing size is a good idea with -Os
>        regardless of them being declared inline.  */
>     { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
> 
> looks like that -finline-functions is ONLY enabled at -O3 and -Os, not for O2.
> (However, I am confused with why -finline-functions should be enabled for 
> -Os?)


yes, it is documentation bug. Seems like Eric beat me on fixing it.
> 
> >> 
> >>> 
> >>> In my test bed this included Firefox with or without LTO becuase they do
> >>> "poor man's" LTO by #including multiple .cpp files into single unified
> >>> source which makes average units large.  Also tramp3d, DLV from our C++
> >>> benhcmark is affected. 
> >>> 
> >>> I have some data on Firefox and I will build remainin ones:
> >> in the following, are the data for code size? are the optimization level 
> >> O3?
> >> what’s PGO mean?  
> > 
> > Those are sizes of libxul, which is the largest library of Firefox.
> > PGO is profile guided optimization.
> 
> Okay.  I see. 
> 
> looks like for LTO,  the code size increase with profiling is much smaller 
> than
> that without profiling when growth is increased from 20% to 40%.  

With LTo the growth is about 9%, while for non-LTO is about about 4% and
with PGO it is about 3%.  This is expected.

For non-LTO most of translation units do not hit the limit becuase most
of calls are external. Firefox is bit special here by using the #include
based unified build that gets it closer to LTO, but not quite.

With LTO there is only one translation unit that hits the 20% code size
growth that after optimization translates to that 9%

With profilef feedback code is partitioned into cold and hot sections
where only hot section growths by the given percentage. For firefox
about 15% of the binary is trained and rest is cold.
> 
> for Non-LTO, the code size increase is minimal when growth is increased fro 
> 20% to 40%.
> 
> However, not quite understand the last column, could you please explain a 
> little bit
> on last column (-finline-functions)?

It is non-lto build but with additional -finline-functions.

GCC build machinery uses -O2 by default and -O3 for some files. Adding
-finline-functions enables agressive inlining everywhere.  But double
checking the numbers, I must have cut&pasted wrong data here.  For
growth 20 -finline-functions non-LTO non-PGO I get 107272791 (so table
is wrong) and increasing growth to 40 gets me 115311719 (which is
correct in the table)
> 
> >>> 
> >>> growth            LTO+PGO    PGO       LTO        none      
> >>> -finline-functions
> >>> 20 (default)   83752215   94390023  93085455  103437191  94351191
> >>> 40             85299111   97220935  101600151 108910311  115311719
> >>> clang          111520431            114863807 108437807

It should be:
growth          LTO+PGO    PGO       LTO        none      -finline-functions
20 (default)   83752215   94390023  93085455  103437191  107272791
40             85299111   97220935  101600151 108910311  115311719
clang          111520431            114863807 108437807

So 7.5% growth.
> > I was poking about this for a while, but did not really have very good
> > testcases available making it difficult to judge code size/performance
> > tradeoffs here.  With Firefox I can measure things better now and
> > it is clear that 20% growth is just too small. It is small even with
> > profile feedback where compiler knows quite well what calls to inline
> > and more so without.
> 
> Yes, for C++, 20% might be too small, especially for cross-file inlining.
> and C++ applications usually benefit more from inlining. 

Yep, that is my conclussion too.
> 
> >> 
> >> When increasing the value of inline-unit-growth for LTO is one approach to 
> >> resolve this issue, adjusting
> >> the sorting heuristic to sort those important and smaller routines as 
> >> higher priority to be inlined might be
> >> another and better approach? 
> > 
> > Yes, i have also reworked the inline metrics somehwat and spent quite
> > some time looking into dumps to see that it behaves reasonably.  There
> > was two ages old bugs I fixed in last two weeks and also added some
> > extra tricks like penalizing cross-module inlines some time ago. Given
> > the fact that even with profile feedback I am not able to sort the
> > priority queue well and neither can Clang do the job, I think it is good
> > motivation to adjust the parameter which I have set somewhat arbitrarily
> > at a time I was not able to test it well.
> 
> where is the code for your current heuristic to sorting the inlinable 
> candidates?

It is in ipa-inline.c:edge-badness
If you use -fdump-ipa-inline-details you can search for "Considering" in
the dump file to find record about every inline decision. It dumps the
badness value and also the individual values used to compute it.

Honza
> 
> Thanks.
> 
> Qing
> > 
> > Honza
>

Re: Remove overall growth from badness metrics

Reply via email to