>>> >>> Those are sizes of libxul, which is the largest library of Firefox. >>> PGO is profile guided optimization. >> >> Okay. I see. >> >> looks like for LTO, the code size increase with profiling is much smaller >> than >> that without profiling when growth is increased from 20% to 40%. > > With LTo the growth is about 9%, while for non-LTO is about about 4% and > with PGO it is about 3%. This is expected. > > For non-LTO most of translation units do not hit the limit becuase most > of calls are external. Firefox is bit special here by using the #include > based unified build that gets it closer to LTO, but not quite. > > With LTO there is only one translation unit that hits the 20% code size > growth that after optimization translates to that 9% > > With profilef feedback code is partitioned into cold and hot sections > where only hot section growths by the given percentage. For firefox > about 15% of the binary is trained and rest is cold. >> >> for Non-LTO, the code size increase is minimal when growth is increased fro >> 20% to 40%. >> >> However, not quite understand the last column, could you please explain a >> little bit >> on last column (-finline-functions)? > > It is non-lto build but with additional -finline-functions. > > GCC build machinery uses -O2 by default and -O3 for some files. Adding > -finline-functions enables agressive inlining everywhere. But double > checking the numbers, I must have cut&pasted wrong data here. For > growth 20 -finline-functions non-LTO non-PGO I get 107272791 (so table > is wrong) and increasing growth to 40 gets me 115311719 (which is > correct in the table) >> >>>>> >>>>> growth LTO+PGO PGO LTO none >>>>> -finline-functions >>>>> 20 (default) 83752215 94390023 93085455 103437191 94351191 >>>>> 40 85299111 97220935 101600151 108910311 115311719 >>>>> clang 111520431 114863807 108437807 > > It should be: > growth LTO+PGO PGO LTO none > -finline-functions > 20 (default) 83752215 94390023 93085455 103437191 107272791 > 40 85299111 97220935 101600151 108910311 115311719 > clang 111520431 114863807 108437807 > > So 7.5% growth. Okay, I see.
>>> >>> Yes, i have also reworked the inline metrics somehwat and spent quite >>> some time looking into dumps to see that it behaves reasonably. There >>> was two ages old bugs I fixed in last two weeks and also added some >>> extra tricks like penalizing cross-module inlines some time ago. Given >>> the fact that even with profile feedback I am not able to sort the >>> priority queue well and neither can Clang do the job, I think it is good >>> motivation to adjust the parameter which I have set somewhat arbitrarily >>> at a time I was not able to test it well. >> >> where is the code for your current heuristic to sorting the inlinable >> candidates? > > It is in ipa-inline.c:edge-badness > If you use -fdump-ipa-inline-details you can search for "Considering" in > the dump file to find record about every inline decision. It dumps the > badness value and also the individual values used to compute it. thanks, will take a look on it. Qing >