>>> 
>>> Those are sizes of libxul, which is the largest library of Firefox.
>>> PGO is profile guided optimization.
>> 
>> Okay.  I see. 
>> 
>> looks like for LTO,  the code size increase with profiling is much smaller 
>> than
>> that without profiling when growth is increased from 20% to 40%.  
> 
> With LTo the growth is about 9%, while for non-LTO is about about 4% and
> with PGO it is about 3%.  This is expected.
> 
> For non-LTO most of translation units do not hit the limit becuase most
> of calls are external. Firefox is bit special here by using the #include
> based unified build that gets it closer to LTO, but not quite.
> 
> With LTO there is only one translation unit that hits the 20% code size
> growth that after optimization translates to that 9%
> 
> With profilef feedback code is partitioned into cold and hot sections
> where only hot section growths by the given percentage. For firefox
> about 15% of the binary is trained and rest is cold.
>> 
>> for Non-LTO, the code size increase is minimal when growth is increased fro 
>> 20% to 40%.
>> 
>> However, not quite understand the last column, could you please explain a 
>> little bit
>> on last column (-finline-functions)?
> 
> It is non-lto build but with additional -finline-functions.
> 
> GCC build machinery uses -O2 by default and -O3 for some files. Adding
> -finline-functions enables agressive inlining everywhere.  But double
> checking the numbers, I must have cut&pasted wrong data here.  For
> growth 20 -finline-functions non-LTO non-PGO I get 107272791 (so table
> is wrong) and increasing growth to 40 gets me 115311719 (which is
> correct in the table)
>> 
>>>>> 
>>>>> growth            LTO+PGO    PGO       LTO        none      
>>>>> -finline-functions
>>>>> 20 (default)   83752215   94390023  93085455  103437191  94351191
>>>>> 40             85299111   97220935  101600151 108910311  115311719
>>>>> clang          111520431            114863807 108437807
> 
> It should be:
> growth                LTO+PGO    PGO       LTO        none      
> -finline-functions
> 20 (default)   83752215   94390023  93085455  103437191  107272791
> 40             85299111   97220935  101600151 108910311  115311719
> clang          111520431            114863807 108437807
> 
> So 7.5% growth.
Okay, I see.

>>> 
>>> Yes, i have also reworked the inline metrics somehwat and spent quite
>>> some time looking into dumps to see that it behaves reasonably.  There
>>> was two ages old bugs I fixed in last two weeks and also added some
>>> extra tricks like penalizing cross-module inlines some time ago. Given
>>> the fact that even with profile feedback I am not able to sort the
>>> priority queue well and neither can Clang do the job, I think it is good
>>> motivation to adjust the parameter which I have set somewhat arbitrarily
>>> at a time I was not able to test it well.
>> 
>> where is the code for your current heuristic to sorting the inlinable 
>> candidates?
> 
> It is in ipa-inline.c:edge-badness
> If you use -fdump-ipa-inline-details you can search for "Considering" in
> the dump file to find record about every inline decision. It dumps the
> badness value and also the individual values used to compute it.

thanks, will take a look on it.

Qing
> 

Reply via email to