Hi Homza.
> On 26 May 2025, at 7:48 pm, Jan Hubicka <hubi...@ucw.cz> wrote: > > External email: Use caution opening links or attachments > > >> >> >>> On 26 May 2025, at 5:34 pm, Jan Hubicka <hubi...@ucw.cz> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> Hi, >>> also, please, can you add an testcase? We should have some coverage for >>> auto-fdo specific issues.... >> I was looking for this too. AFIK we dont do any testing currently. >> We could >> >> 1. Add gcov files as part of the test. However, This would make updating >> gcov versions difficult. >> 2.We could add execution test that also uses autfdo tools to generate .gcov. >> This would make them slow. >> Also we may not be able to match exact profile values and only see if afdo >> annotations are there. > > There is a testuiste coverage, but currently enabled only for Intel > based x86_64 CPUs and I think no-one runs it regularly. To get AutoFDO > into a good shape we definitely need to enable it on more setup and also > start testing/benmarking regularly. I will look into this. We also want to enable it for aacrh64. > > For a long time I had no easy access for CPU with AutoFDO support, but > now I have zen3 based desktop and also use zen5 based box for testing. > I think the attached patch makes testuite do the right hting on AMD Zens 3,4 > and 5. > > I get following failures on Zen5: > FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining > add1/1 into main/4." > FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining > sub1/2 into main/4." > FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function > ..;" > FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 > times" > FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 > times" > > while on Intel CPU I get: > FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining > add1/1 into main/4." > FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining > sub1/2 into main/4." > FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-tree-dump-not optimized > "Invalid sum" > FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function > ..;" > FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 > times" > FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 > times" > FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled likely exits: > likely decreased number of iterations of loop 1" > FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled all exits: > decreased number of iterations of loop 2" > FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-tree-dump-not optimized > "Invalid sum" > > I did not dive yet into where the difference scome from. > > Andy, does the patch makes sense to you? I simply followed kernel's > auto-fdo instructions for clang and built current git version of > create_gcov. In the past I always had troubles to get create_gcov > working with version of perf distributted by open-suse, but this time it > seems to work even though it complains: > > [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1322] > Skipping 4228 bytes of metadata: HEADER_CPU_TOPOLOGY > [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] > Skipping unsupported event PERF_RECORD_ID_INDEX > [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] > Skipping unsupported event PERF_RECORD_EVENT_UPDATE > [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] > Skipping unsupported event PERF_RECORD_CPU_MAP > [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] > Skipping unsupported event UNKNOWN_EVENT_82 > [INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1060] > Number of events stored: 2178 > [INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_parser.cc:272] > Parser processed: 5 MMAP/MMAP2 events, 2 COMM events, 0 FORK events, 1 EXIT > events, 2108 SAMPLE events, 2099 of these were mapped, 0 SAMPLE events with a > data address, 0 of these were mapped > WARNING: Logging before InitGoogleLogging() is written to STDERR > I20250525 22:10:18.478610 1692721 sample_reader.cc:289] No buildid found in > binary > W20250525 22:10:18.479000 1692721 sample_reader.cc:345] Bogus LBR data (range > is negative): 1050->0 index=4 > W20250525 22:10:18.479007 1692721 sample_reader.cc:345] Bogus LBR data (range > is negative): 1057->0 index=2 > W20250525 22:10:18.479010 1692721 sample_reader.cc:345] Bogus LBR data (range > is negative): 1050->0 index=6 > W20250525 22:10:18.479013 1692721 sample_reader.cc:345] Bogus LBR data (range > is negative): 1050->0 index=6 > W20250525 22:10:18.479017 1692721 sample_reader.cc:345] Bogus LBR data (range > is negative): 1057->0 index=8 > W20250525 22:10:18.479019 1692721 sample_reader.cc:345] Bogus LBR data (range > is negative): 1050->0 index=c > I20250525 22:10:18.479228 1692721 symbol_map.cc:477] Adding loadable exec > segment: offset=1000 vaddr=401000 > > Did someone run SPEC recently? I made auto-FDO spec config and tested > -Ofast with ipa-icf, ipa-cp-clone and ipa-sra disabled (to get rid of > the clone merging). I get sort of comparable results as w/o profile at > all. This is actually not _that_ bad start - it means that the data is > probably not completely bogus, just not very useful :) > (Without disabling ipa-cp, for example exchange regresses a lot since > all profile info of the hot clone is lost). We did test gcc AutoFDO with SPE profiling (using some local patches for AutoFDO tools) and BRBE for aarch64. I was comparing it against PGO. The cloning issues shows up with 648.exchange2_s. Thanks, Kugan > > About the pre-ipa and post-ipa clone issues I think we may need to list > names of clones that are created late we want to drop and keep it up to > date, perhaps inventing clones.def file... > > Honza > > contrib/ChangeLog: > > * gen_autofdo_event.py: Add support for AMD Zen 3 and > later CPUs. > > gcc/ChangeLog: > > * config/i386/gcc-auto-profile: regenerate. > > diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py > index 4364e5ce072..b1d373f82fe 100755 > --- a/contrib/gen_autofdo_event.py > +++ b/contrib/gen_autofdo_event.py > @@ -138,8 +138,16 @@ if [ "$1" = "--all" ] ; then > shift > fi > > -if ! grep -q Intel /proc/cpuinfo ; then > - echo >&2 "Only Intel CPUs supported" > +if grep -q AuthenticAMD /proc/cpuinfo ; then > + vendor=AMD > + if ! grep -q " brs" /proc/cpuinfo && ! grep -q amd_lbr_v2 /proc/cpuinfo ; > then > + echo >&2 "AMD CPU with brs (Zen 3) or amd_lbr_v2 (Zen 4+) feature is > required" > + exit 1 > + fi > +elif grep -q Intel /proc/cpuinfo ; then > + vendor=Intel > +else > + echo >&2 "Only AMD and Intel CPUs supported" > exit 1 > fi > > @@ -147,7 +155,7 @@ if grep -q hypervisor /proc/cpuinfo ; then > echo >&2 "Warning: branch profiling may not be functional in VMs" > fi > > -case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo && > +case `test $vendor = Intel && grep -E -q "^cpu family\s*: 6" /proc/cpuinfo && > grep -E "^model\s*:" /proc/cpuinfo | head -n1` in''') > for event, mod in eventmap.items(): > for m in mod[:-1]: > @@ -156,8 +164,13 @@ case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo && > print(r'''*) > if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; > then > E=br_inst_retired.near_taken:p > + elif perf list ex_ret_brn_tkn | grep -q ex_ret_brn_tkn ; then > + E=ex_ret_brn_tkn:P$FLAGS > + elif $vendor = Intel ; then > +echo >&2 "Unknown Intel CPU. Run contrib/gen_autofdo_event.py --all --script > to update script." > + exit 1 > else > -echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to > update script." > +echo >&2 "AMD CPU without support for ex_ret_brn_tkn event" > exit 1 > fi ;;''') > print(r"esac") > diff --git a/gcc/config/i386/gcc-auto-profile > b/gcc/config/i386/gcc-auto-profile > index 528b34e4240..0e9e5fec2fe 100755 > --- a/gcc/config/i386/gcc-auto-profile > +++ b/gcc/config/i386/gcc-auto-profile > @@ -24,8 +24,16 @@ if [ "$1" = "--all" ] ; then > shift > fi > > -if ! grep -q Intel /proc/cpuinfo ; then > - echo >&2 "Only Intel CPUs supported" > +if grep -q AuthenticAMD /proc/cpuinfo ; then > + vendor=AMD > + if ! grep -q " brs" /proc/cpuinfo && ! grep -q amd_lbr_v2 /proc/cpuinfo ; > then > + echo >&2 "AMD CPU with brs (Zen 3) or amd_lbr_v2 (Zen 4+) feature is > required" > + exit 1 > + fi > +elif grep -q Intel /proc/cpuinfo ; then > + vendor=Intel > +else > + echo >&2 "Only AMD and Intel CPUs supported" > exit 1 > fi > > @@ -33,7 +41,7 @@ if grep -q hypervisor /proc/cpuinfo ; then > echo >&2 "Warning: branch profiling may not be functional in VMs" > fi > > -case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo && > +case `test $vendor = Intel && grep -E -q "^cpu family\s*: 6" /proc/cpuinfo && > grep -E "^model\s*:" /proc/cpuinfo | head -n1` in > model*:\ 46|\ > model*:\ 30|\ > @@ -82,6 +90,8 @@ model*:\ 126|\ > model*:\ 167|\ > model*:\ 140|\ > model*:\ 141|\ > +model*:\ 143|\ > +model*:\ 207|\ > model*:\ 106|\ > model*:\ 108|\ > model*:\ 173|\ > @@ -89,15 +99,20 @@ model*:\ 174) E="cpu/event=0xc4,umask=0x20/$FLAGS" ;; > model*:\ 134|\ > model*:\ 150|\ > model*:\ 156) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;; > -model*:\ 143|\ > -model*:\ 207) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;; > -model*:\ 190) E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;; > +model*:\ 190|\ > +model*:\ 175|\ > +model*:\ 182) E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;; > model*:\ 190) E="cpu/event=0xc4,umask=0xfe/$FLAGS" ;; > *) > if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; > then > E=br_inst_retired.near_taken:p > + elif perf list ex_ret_brn_tkn | grep -q ex_ret_brn_tkn ; then > + E=ex_ret_brn_tkn:P$FLAGS > + elif $vendor = Intel ; then > +echo >&2 "Unknown Intel CPU. Run contrib/gen_autofdo_event.py --all --script > to update script." > + exit 1 > else > -echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to > update script." > +echo >&2 "AMD CPU without support for ex_ret_brn_tkn event" > exit 1 > fi ;; > esac