Honza, Thanks a lot for your comments.
> On May 9, 2023, at 6:22 AM, Jan Hubicka <hubi...@ucw.cz> wrote: > >>>>> >>>>> From my understanding, -fprofile-partial-training is one important option >>>>> for PGO performance. >>>> >>>> I don't think so, speed benefit would be rather small I guess. >>> I saw some articles online to introduce this option for gcc10, >>> https://documentation.suse.com/sbp/all/html/SBP-GCC-10/index.html#sec-gcc10-pgo >> >> Hi. >> >> Ah, I see. >> >>> And also based on my previous experience in Studio compiler, I guess that >>> this one might have >>> Some good performance impact on PGO. Is there any old performance data on >>> this option? (I cannot find online) >> >> Maybe Honza can chime in here? Or Martin who is the author of the white >> paper. > > Main motivation for this was profiling programs that contain specific > code paths for different CPUs (such as graphics library in Firefox or Linux > kernel). In the situation training machine differs from the machine > program is run later, we end up optimizing for size all code paths > except ones taken by the specific CPU. This patch essentially tells gcc > to consider every non-trained function as built without profile > feedback. Make sense. > > For Firefox it had important impact on graphics rendering tests back > then since the building machined had AVX while the benchmarking did not. > Some benchmarks improved several times which is not a surprise if you > consider tight graphics rendering loop optimized for size versus > vectorized one. That’s a lot of improvement. So, without -fprofile-partial-training, the PGO hurt the performance for those cases? > The patch has bad effect on code size which in turn > impacts performance too, so I think it makes sense to use > -fprofile-partial-training with bit of care (i.e. only one code where > such scenarios are likely). Right. > > As for backporting, I do not have checkout of GCC 8 right now. It > depends on profile infrastructure that was added in 2017 (so stage1 of > GCC 8), so the patch may backport quite easilly. I am not 100% sure > what shape the infrastrucure was in the first version, but I am quite > convinced it had the necessary bits - it was able to make the difference > between 0 profile count and missing profile feedback. This is good to know, I will try to back port to GCC8 and let them test to see any good impact. Qing > > Honza >>