> On May 11, 2023, at 12:08 PM, Qing Zhao via Gcc-patches 
> <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> 
>> On May 10, 2023, at 9:15 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
>> 
>>> Honza,
>>>> Main motivation for this was profiling programs that contain specific
>>>> code paths for different CPUs (such as graphics library in Firefox or Linux
>>>> kernel). In the situation training machine differs from the machine
>>>> program is run later, we end up optimizing for size all code paths
>>>> except ones taken by the specific CPU.  This patch essentially tells gcc
>>>> to consider every non-trained function as built without profile
>>>> feedback.
>>> Make sense.
>>>> 
>>>> For Firefox it had important impact on graphics rendering tests back
>>>> then since the building machined had AVX while the benchmarking did not.
>>>> Some benchmarks improved several times which is not a surprise if you
>>>> consider tight graphics rendering loop optimized for size versus
>>>> vectorized one.  
>>> 
>>> That’s a lot of improvement. So, without -fprofile-partial-training, the 
>>> PGO hurt the performance for those cases? 
>> 
>> Yes, to get code size improvements we assume that the non-trained part
>> of code is cold and with -Os we are very aggressive to optimize for
>> size.  We now have two-level optimize_for size, so I think we could
>> make this more fine grained this stage1.
> 
> Okay. I see. 
> 
> Thanks a lot for the info.
> 
> Another question (which is confusing us very much right now is):
> 
> When we lower the following  parameter from 999 to 950: (in GCC8)
> 
> DEFPARAM(HOT_BB_COUNT_WS_PERMILLE,
>         "hot-bb-count-ws-permille",
>         "A basic block profile count is considered hot if it contributes to "
>         "the given permillage of the entire profiled execution.”
>         999, 0, 1000)
> 
> The size of the “text.hot" section is 4x times SMALLER than the default one. 
> Is this expected behavior? 

As my further study of GCC8, yes, this is the expected behavior. -:).

Qing
> (From my reading of the GCC8 source code, when this parameter is getting 
> smaller, more basic blocks and functions will
> Be considered as HOT by GCC, then the text.hot section should be larger, not 
> smaller, do I miss anything here?)
> 
> Thanks a lot for your help.
> 
> Qing
> 
>> 
>> Honza
>>> 
>>>> The patch has bad effect on code size which in turn
>>>> impacts performance too, so I think it makes sense to use
>>>> -fprofile-partial-training with bit of care (i.e. only one code where
>>>> such scenarios are likely).
>>> 
>>> Right. 
>>>> 
>>>> As for backporting, I do not have checkout of GCC 8 right now. It
>>>> depends on profile infrastructure that was added in 2017 (so stage1 of
>>>> GCC 8), so the patch may backport quite easilly.  I am not 100% sure
>>>> what shape the infrastrucure was in the first version, but I am quite
>>>> convinced it had the necessary bits - it was able to make the difference
>>>> between 0 profile count and missing profile feedback.
>>> 
>>> This is good to know, I will try to back port to GCC8 and let them test to 
>>> see any good impact.
>>> 
>>> Qing
>>>> 
>>>> Honza

Reply via email to