> On May 11, 2023, at 12:08 PM, Qing Zhao via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > >> On May 10, 2023, at 9:15 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >> >>> Honza, >>>> Main motivation for this was profiling programs that contain specific >>>> code paths for different CPUs (such as graphics library in Firefox or Linux >>>> kernel). In the situation training machine differs from the machine >>>> program is run later, we end up optimizing for size all code paths >>>> except ones taken by the specific CPU. This patch essentially tells gcc >>>> to consider every non-trained function as built without profile >>>> feedback. >>> Make sense. >>>> >>>> For Firefox it had important impact on graphics rendering tests back >>>> then since the building machined had AVX while the benchmarking did not. >>>> Some benchmarks improved several times which is not a surprise if you >>>> consider tight graphics rendering loop optimized for size versus >>>> vectorized one. >>> >>> That’s a lot of improvement. So, without -fprofile-partial-training, the >>> PGO hurt the performance for those cases? >> >> Yes, to get code size improvements we assume that the non-trained part >> of code is cold and with -Os we are very aggressive to optimize for >> size. We now have two-level optimize_for size, so I think we could >> make this more fine grained this stage1. > > Okay. I see. > > Thanks a lot for the info. > > Another question (which is confusing us very much right now is): > > When we lower the following parameter from 999 to 950: (in GCC8) > > DEFPARAM(HOT_BB_COUNT_WS_PERMILLE, > "hot-bb-count-ws-permille", > "A basic block profile count is considered hot if it contributes to " > "the given permillage of the entire profiled execution.” > 999, 0, 1000) > > The size of the “text.hot" section is 4x times SMALLER than the default one. > Is this expected behavior?
As my further study of GCC8, yes, this is the expected behavior. -:). Qing > (From my reading of the GCC8 source code, when this parameter is getting > smaller, more basic blocks and functions will > Be considered as HOT by GCC, then the text.hot section should be larger, not > smaller, do I miss anything here?) > > Thanks a lot for your help. > > Qing > >> >> Honza >>> >>>> The patch has bad effect on code size which in turn >>>> impacts performance too, so I think it makes sense to use >>>> -fprofile-partial-training with bit of care (i.e. only one code where >>>> such scenarios are likely). >>> >>> Right. >>>> >>>> As for backporting, I do not have checkout of GCC 8 right now. It >>>> depends on profile infrastructure that was added in 2017 (so stage1 of >>>> GCC 8), so the patch may backport quite easilly. I am not 100% sure >>>> what shape the infrastrucure was in the first version, but I am quite >>>> convinced it had the necessary bits - it was able to make the difference >>>> between 0 profile count and missing profile feedback. >>> >>> This is good to know, I will try to back port to GCC8 and let them test to >>> see any good impact. >>> >>> Qing >>>> >>>> Honza