> > > > On 26 May 2025, at 5:34 pm, Jan Hubicka <hubi...@ucw.cz> wrote: > > > > External email: Use caution opening links or attachments > > > > > > Hi, > > also, please, can you add an testcase? We should have some coverage for > > auto-fdo specific issues.... > I was looking for this too. AFIK we dont do any testing currently. > We could > > 1. Add gcov files as part of the test. However, This would make updating gcov > versions difficult. > 2.We could add execution test that also uses autfdo tools to generate .gcov. > This would make them slow. > Also we may not be able to match exact profile values and only see if afdo > annotations are there.
There is a testuiste coverage, but currently enabled only for Intel based x86_64 CPUs and I think no-one runs it regularly. To get AutoFDO into a good shape we definitely need to enable it on more setup and also start testing/benmarking regularly. For a long time I had no easy access for CPU with AutoFDO support, but now I have zen3 based desktop and also use zen5 based box for testing. I think the attached patch makes testuite do the right hting on AMD Zens 3,4 and 5. I get following failures on Zen5: FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining add1/1 into main/4." FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining sub1/2 into main/4." FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function ..;" FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times" FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 times" while on Intel CPU I get: FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining add1/1 into main/4." FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining sub1/2 into main/4." FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function ..;" FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times" FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 times" FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled likely exits: likely decreased number of iterations of loop 1" FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled all exits: decreased number of iterations of loop 2" FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-tree-dump-not optimized "Invalid sum" I did not dive yet into where the difference scome from. Andy, does the patch makes sense to you? I simply followed kernel's auto-fdo instructions for clang and built current git version of create_gcov. In the past I always had troubles to get create_gcov working with version of perf distributted by open-suse, but this time it seems to work even though it complains: [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1322] Skipping 4228 bytes of metadata: HEADER_CPU_TOPOLOGY [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event PERF_RECORD_ID_INDEX [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event PERF_RECORD_EVENT_UPDATE [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event PERF_RECORD_CPU_MAP [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_82 [INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1060] Number of events stored: 2178 [INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_parser.cc:272] Parser processed: 5 MMAP/MMAP2 events, 2 COMM events, 0 FORK events, 1 EXIT events, 2108 SAMPLE events, 2099 of these were mapped, 0 SAMPLE events with a data address, 0 of these were mapped WARNING: Logging before InitGoogleLogging() is written to STDERR I20250525 22:10:18.478610 1692721 sample_reader.cc:289] No buildid found in binary W20250525 22:10:18.479000 1692721 sample_reader.cc:345] Bogus LBR data (range is negative): 1050->0 index=4 W20250525 22:10:18.479007 1692721 sample_reader.cc:345] Bogus LBR data (range is negative): 1057->0 index=2 W20250525 22:10:18.479010 1692721 sample_reader.cc:345] Bogus LBR data (range is negative): 1050->0 index=6 W20250525 22:10:18.479013 1692721 sample_reader.cc:345] Bogus LBR data (range is negative): 1050->0 index=6 W20250525 22:10:18.479017 1692721 sample_reader.cc:345] Bogus LBR data (range is negative): 1057->0 index=8 W20250525 22:10:18.479019 1692721 sample_reader.cc:345] Bogus LBR data (range is negative): 1050->0 index=c I20250525 22:10:18.479228 1692721 symbol_map.cc:477] Adding loadable exec segment: offset=1000 vaddr=401000 Did someone run SPEC recently? I made auto-FDO spec config and tested -Ofast with ipa-icf, ipa-cp-clone and ipa-sra disabled (to get rid of the clone merging). I get sort of comparable results as w/o profile at all. This is actually not _that_ bad start - it means that the data is probably not completely bogus, just not very useful :) (Without disabling ipa-cp, for example exchange regresses a lot since all profile info of the hot clone is lost). About the pre-ipa and post-ipa clone issues I think we may need to list names of clones that are created late we want to drop and keep it up to date, perhaps inventing clones.def file... Honza contrib/ChangeLog: * gen_autofdo_event.py: Add support for AMD Zen 3 and later CPUs. gcc/ChangeLog: * config/i386/gcc-auto-profile: regenerate. diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py index 4364e5ce072..b1d373f82fe 100755 --- a/contrib/gen_autofdo_event.py +++ b/contrib/gen_autofdo_event.py @@ -138,8 +138,16 @@ if [ "$1" = "--all" ] ; then shift fi -if ! grep -q Intel /proc/cpuinfo ; then - echo >&2 "Only Intel CPUs supported" +if grep -q AuthenticAMD /proc/cpuinfo ; then + vendor=AMD + if ! grep -q " brs" /proc/cpuinfo && ! grep -q amd_lbr_v2 /proc/cpuinfo ; then + echo >&2 "AMD CPU with brs (Zen 3) or amd_lbr_v2 (Zen 4+) feature is required" + exit 1 + fi +elif grep -q Intel /proc/cpuinfo ; then + vendor=Intel +else + echo >&2 "Only AMD and Intel CPUs supported" exit 1 fi @@ -147,7 +155,7 @@ if grep -q hypervisor /proc/cpuinfo ; then echo >&2 "Warning: branch profiling may not be functional in VMs" fi -case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo && +case `test $vendor = Intel && grep -E -q "^cpu family\s*: 6" /proc/cpuinfo && grep -E "^model\s*:" /proc/cpuinfo | head -n1` in''') for event, mod in eventmap.items(): for m in mod[:-1]: @@ -156,8 +164,13 @@ case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo && print(r'''*) if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; then E=br_inst_retired.near_taken:p + elif perf list ex_ret_brn_tkn | grep -q ex_ret_brn_tkn ; then + E=ex_ret_brn_tkn:P$FLAGS + elif $vendor = Intel ; then +echo >&2 "Unknown Intel CPU. Run contrib/gen_autofdo_event.py --all --script to update script." + exit 1 else -echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script." +echo >&2 "AMD CPU without support for ex_ret_brn_tkn event" exit 1 fi ;;''') print(r"esac") diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile index 528b34e4240..0e9e5fec2fe 100755 --- a/gcc/config/i386/gcc-auto-profile +++ b/gcc/config/i386/gcc-auto-profile @@ -24,8 +24,16 @@ if [ "$1" = "--all" ] ; then shift fi -if ! grep -q Intel /proc/cpuinfo ; then - echo >&2 "Only Intel CPUs supported" +if grep -q AuthenticAMD /proc/cpuinfo ; then + vendor=AMD + if ! grep -q " brs" /proc/cpuinfo && ! grep -q amd_lbr_v2 /proc/cpuinfo ; then + echo >&2 "AMD CPU with brs (Zen 3) or amd_lbr_v2 (Zen 4+) feature is required" + exit 1 + fi +elif grep -q Intel /proc/cpuinfo ; then + vendor=Intel +else + echo >&2 "Only AMD and Intel CPUs supported" exit 1 fi @@ -33,7 +41,7 @@ if grep -q hypervisor /proc/cpuinfo ; then echo >&2 "Warning: branch profiling may not be functional in VMs" fi -case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo && +case `test $vendor = Intel && grep -E -q "^cpu family\s*: 6" /proc/cpuinfo && grep -E "^model\s*:" /proc/cpuinfo | head -n1` in model*:\ 46|\ model*:\ 30|\ @@ -82,6 +90,8 @@ model*:\ 126|\ model*:\ 167|\ model*:\ 140|\ model*:\ 141|\ +model*:\ 143|\ +model*:\ 207|\ model*:\ 106|\ model*:\ 108|\ model*:\ 173|\ @@ -89,15 +99,20 @@ model*:\ 174) E="cpu/event=0xc4,umask=0x20/$FLAGS" ;; model*:\ 134|\ model*:\ 150|\ model*:\ 156) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;; -model*:\ 143|\ -model*:\ 207) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;; -model*:\ 190) E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;; +model*:\ 190|\ +model*:\ 175|\ +model*:\ 182) E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;; model*:\ 190) E="cpu/event=0xc4,umask=0xfe/$FLAGS" ;; *) if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; then E=br_inst_retired.near_taken:p + elif perf list ex_ret_brn_tkn | grep -q ex_ret_brn_tkn ; then + E=ex_ret_brn_tkn:P$FLAGS + elif $vendor = Intel ; then +echo >&2 "Unknown Intel CPU. Run contrib/gen_autofdo_event.py --all --script to update script." + exit 1 else -echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script." +echo >&2 "AMD CPU without support for ex_ret_brn_tkn event" exit 1 fi ;; esac