> 
> 
> > On 26 May 2025, at 5:34 pm, Jan Hubicka <hubi...@ucw.cz> wrote:
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > Hi,
> > also, please, can you add an testcase?  We should have some coverage for
> > auto-fdo specific issues....
> I was looking for this too. AFIK we dont do any testing currently. 
> We could 
> 
> 1. Add gcov files as part of the test. However, This would make updating gcov 
> versions difficult.
> 2.We could add execution test that also uses autfdo tools to generate .gcov. 
> This would make them slow.
> Also we may not be able to match exact profile values and only see if  afdo 
> annotations are there.

There is a testuiste coverage, but currently enabled only for Intel
based x86_64 CPUs and I think no-one runs it regularly.  To get AutoFDO
into a good shape we definitely need to enable it on more setup and also
start testing/benmarking regularly.

For a long time I had no easy access for CPU with AutoFDO support, but
now I have zen3 based desktop and also use zen5 based box for testing.
I think the attached patch makes testuite do the right hting on AMD Zens 3,4 
and 5.

I get following failures on Zen5:
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining add1/1 
into main/4."
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining sub1/2 
into main/4."
FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function ..;"
FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 times"

while on Intel CPU I get:
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining add1/1 
into main/4."
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining sub1/2 
into main/4."
FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-tree-dump-not optimized "Invalid 
sum"
FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function ..;"
FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 times"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled likely exits: likely 
decreased number of iterations of loop 1"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled all exits: decreased 
number of iterations of loop 2"
FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-tree-dump-not optimized 
"Invalid sum"

I did not dive yet into where the difference scome from.  

Andy, does the patch makes sense to you?  I simply followed kernel's
auto-fdo instructions for clang and built current git version of
create_gcov.  In the past I always had troubles to get create_gcov
working with version of perf distributted by open-suse, but this time it
seems to work even though it complains:

[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1322]
 Skipping 4228 bytes of metadata: HEADER_CPU_TOPOLOGY
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
 Skipping unsupported event PERF_RECORD_ID_INDEX
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
 Skipping unsupported event PERF_RECORD_EVENT_UPDATE
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
 Skipping unsupported event PERF_RECORD_CPU_MAP
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
 Skipping unsupported event UNKNOWN_EVENT_82
[INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1060]
 Number of events stored: 2178
[INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_parser.cc:272]
 Parser processed: 5 MMAP/MMAP2 events, 2 COMM events, 0 FORK events, 1 EXIT 
events, 2108 SAMPLE events, 2099 of these were mapped, 0 SAMPLE events with a 
data address, 0 of these were mapped
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20250525 22:10:18.478610 1692721 sample_reader.cc:289] No buildid found in 
binary
W20250525 22:10:18.479000 1692721 sample_reader.cc:345] Bogus LBR data (range 
is negative): 1050->0 index=4
W20250525 22:10:18.479007 1692721 sample_reader.cc:345] Bogus LBR data (range 
is negative): 1057->0 index=2
W20250525 22:10:18.479010 1692721 sample_reader.cc:345] Bogus LBR data (range 
is negative): 1050->0 index=6
W20250525 22:10:18.479013 1692721 sample_reader.cc:345] Bogus LBR data (range 
is negative): 1050->0 index=6
W20250525 22:10:18.479017 1692721 sample_reader.cc:345] Bogus LBR data (range 
is negative): 1057->0 index=8
W20250525 22:10:18.479019 1692721 sample_reader.cc:345] Bogus LBR data (range 
is negative): 1050->0 index=c
I20250525 22:10:18.479228 1692721 symbol_map.cc:477] Adding loadable exec 
segment: offset=1000 vaddr=401000

Did someone run SPEC recently? I made auto-FDO spec config and tested
-Ofast with ipa-icf, ipa-cp-clone and ipa-sra disabled (to get rid of
the clone merging).  I get sort of comparable results as w/o profile at
all.  This is actually not _that_ bad start - it means that the data is
probably not completely bogus, just not very useful :)
(Without disabling ipa-cp, for example exchange regresses a lot since
all profile info of the hot clone is lost).

About the pre-ipa and post-ipa clone issues I think we may need to list
names of clones that are created late we want to drop and keep it up to
date, perhaps inventing clones.def file...

Honza

contrib/ChangeLog:

        * gen_autofdo_event.py: Add support for AMD Zen 3 and
        later CPUs.

gcc/ChangeLog:

        * config/i386/gcc-auto-profile: regenerate.

diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
index 4364e5ce072..b1d373f82fe 100755
--- a/contrib/gen_autofdo_event.py
+++ b/contrib/gen_autofdo_event.py
@@ -138,8 +138,16 @@ if [ "$1" = "--all" ] ; then
   shift
 fi
 
-if ! grep -q Intel /proc/cpuinfo ; then
-  echo >&2 "Only Intel CPUs supported"
+if grep -q AuthenticAMD /proc/cpuinfo ; then
+  vendor=AMD
+  if ! grep -q " brs" /proc/cpuinfo && ! grep -q amd_lbr_v2 /proc/cpuinfo ; 
then
+    echo >&2 "AMD CPU with brs (Zen 3) or amd_lbr_v2 (Zen 4+) feature is 
required"
+    exit 1
+  fi
+elif grep -q Intel /proc/cpuinfo ; then
+  vendor=Intel
+else
+  echo >&2 "Only AMD and Intel CPUs supported"
   exit 1
 fi
 
@@ -147,7 +155,7 @@ if grep -q hypervisor /proc/cpuinfo ; then
   echo >&2 "Warning: branch profiling may not be functional in VMs"
 fi
 
-case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
+case `test $vendor = Intel && grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
   grep -E "^model\s*:" /proc/cpuinfo | head -n1` in''')
     for event, mod in eventmap.items():
         for m in mod[:-1]:
@@ -156,8 +164,13 @@ case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
     print(r'''*)
         if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; 
then
             E=br_inst_retired.near_taken:p
+        elif perf list ex_ret_brn_tkn | grep -q ex_ret_brn_tkn ; then
+            E=ex_ret_brn_tkn:P$FLAGS
+        elif $vendor = Intel ; then
+echo >&2 "Unknown Intel CPU. Run contrib/gen_autofdo_event.py --all --script 
to update script."
+         exit 1
         else
-echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
+echo >&2 "AMD CPU without support for ex_ret_brn_tkn event"
          exit 1
         fi ;;''')
     print(r"esac")
diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
index 528b34e4240..0e9e5fec2fe 100755
--- a/gcc/config/i386/gcc-auto-profile
+++ b/gcc/config/i386/gcc-auto-profile
@@ -24,8 +24,16 @@ if [ "$1" = "--all" ] ; then
   shift
 fi
 
-if ! grep -q Intel /proc/cpuinfo ; then
-  echo >&2 "Only Intel CPUs supported"
+if grep -q AuthenticAMD /proc/cpuinfo ; then
+  vendor=AMD
+  if ! grep -q " brs" /proc/cpuinfo && ! grep -q amd_lbr_v2 /proc/cpuinfo ; 
then
+    echo >&2 "AMD CPU with brs (Zen 3) or amd_lbr_v2 (Zen 4+) feature is 
required"
+    exit 1
+  fi
+elif grep -q Intel /proc/cpuinfo ; then
+  vendor=Intel
+else
+  echo >&2 "Only AMD and Intel CPUs supported"
   exit 1
 fi
 
@@ -33,7 +41,7 @@ if grep -q hypervisor /proc/cpuinfo ; then
   echo >&2 "Warning: branch profiling may not be functional in VMs"
 fi
 
-case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
+case `test $vendor = Intel && grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
   grep -E "^model\s*:" /proc/cpuinfo | head -n1` in
 model*:\ 46|\
 model*:\ 30|\
@@ -82,6 +90,8 @@ model*:\ 126|\
 model*:\ 167|\
 model*:\ 140|\
 model*:\ 141|\
+model*:\ 143|\
+model*:\ 207|\
 model*:\ 106|\
 model*:\ 108|\
 model*:\ 173|\
@@ -89,15 +99,20 @@ model*:\ 174) E="cpu/event=0xc4,umask=0x20/$FLAGS" ;;
 model*:\ 134|\
 model*:\ 150|\
 model*:\ 156) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
-model*:\ 143|\
-model*:\ 207) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;;
-model*:\ 190) E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;;
+model*:\ 190|\
+model*:\ 175|\
+model*:\ 182) E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;;
 model*:\ 190) E="cpu/event=0xc4,umask=0xfe/$FLAGS" ;;
 *)
         if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; 
then
             E=br_inst_retired.near_taken:p
+        elif perf list ex_ret_brn_tkn | grep -q ex_ret_brn_tkn ; then
+            E=ex_ret_brn_tkn:P$FLAGS
+        elif $vendor = Intel ; then
+echo >&2 "Unknown Intel CPU. Run contrib/gen_autofdo_event.py --all --script 
to update script."
+         exit 1
         else
-echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
+echo >&2 "AMD CPU without support for ex_ret_brn_tkn event"
          exit 1
         fi ;;
 esac

Reply via email to