Hi Homza.

> On 26 May 2025, at 7:48 pm, Jan Hubicka <hubi...@ucw.cz> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
>> 
>> 
>>> On 26 May 2025, at 5:34 pm, Jan Hubicka <hubi...@ucw.cz> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Hi,
>>> also, please, can you add an testcase?  We should have some coverage for
>>> auto-fdo specific issues....
>> I was looking for this too. AFIK we dont do any testing currently.
>> We could
>> 
>> 1. Add gcov files as part of the test. However, This would make updating 
>> gcov versions difficult.
>> 2.We could add execution test that also uses autfdo tools to generate .gcov. 
>> This would make them slow.
>> Also we may not be able to match exact profile values and only see if  afdo 
>> annotations are there.
> 
> There is a testuiste coverage, but currently enabled only for Intel
> based x86_64 CPUs and I think no-one runs it regularly.  To get AutoFDO
> into a good shape we definitely need to enable it on more setup and also
> start testing/benmarking regularly.
I will look into  this. We also want to enable it for aacrh64.
> 
> For a long time I had no easy access for CPU with AutoFDO support, but
> now I have zen3 based desktop and also use zen5 based box for testing.
> I think the attached patch makes testuite do the right hting on AMD Zens 3,4 
> and 5.
> 
> I get following failures on Zen5:
> FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining 
> add1/1 into main/4."
> FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining 
> sub1/2 into main/4."
> FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function 
> ..;"
> FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 
> times"
> FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 
> times"
> 
> while on Intel CPU I get:
> FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining 
> add1/1 into main/4."
> FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining 
> sub1/2 into main/4."
> FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-tree-dump-not optimized 
> "Invalid sum"
> FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function 
> ..;"
> FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 
> times"
> FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 
> times"
> FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled likely exits: 
> likely decreased number of iterations of loop 1"
> FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled all exits: 
> decreased number of iterations of loop 2"
> FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-tree-dump-not optimized 
> "Invalid sum"
> 
> I did not dive yet into where the difference scome from.
> 
> Andy, does the patch makes sense to you?  I simply followed kernel's
> auto-fdo instructions for clang and built current git version of
> create_gcov.  In the past I always had troubles to get create_gcov
> working with version of perf distributted by open-suse, but this time it
> seems to work even though it complains:
> 
> [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1322]
>  Skipping 4228 bytes of metadata: HEADER_CPU_TOPOLOGY
> [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
>  Skipping unsupported event PERF_RECORD_ID_INDEX
> [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
>  Skipping unsupported event PERF_RECORD_EVENT_UPDATE
> [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
>  Skipping unsupported event PERF_RECORD_CPU_MAP
> [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
>  Skipping unsupported event UNKNOWN_EVENT_82
> [INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1060]
>  Number of events stored: 2178
> [INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_parser.cc:272]
>  Parser processed: 5 MMAP/MMAP2 events, 2 COMM events, 0 FORK events, 1 EXIT 
> events, 2108 SAMPLE events, 2099 of these were mapped, 0 SAMPLE events with a 
> data address, 0 of these were mapped
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I20250525 22:10:18.478610 1692721 sample_reader.cc:289] No buildid found in 
> binary
> W20250525 22:10:18.479000 1692721 sample_reader.cc:345] Bogus LBR data (range 
> is negative): 1050->0 index=4
> W20250525 22:10:18.479007 1692721 sample_reader.cc:345] Bogus LBR data (range 
> is negative): 1057->0 index=2
> W20250525 22:10:18.479010 1692721 sample_reader.cc:345] Bogus LBR data (range 
> is negative): 1050->0 index=6
> W20250525 22:10:18.479013 1692721 sample_reader.cc:345] Bogus LBR data (range 
> is negative): 1050->0 index=6
> W20250525 22:10:18.479017 1692721 sample_reader.cc:345] Bogus LBR data (range 
> is negative): 1057->0 index=8
> W20250525 22:10:18.479019 1692721 sample_reader.cc:345] Bogus LBR data (range 
> is negative): 1050->0 index=c
> I20250525 22:10:18.479228 1692721 symbol_map.cc:477] Adding loadable exec 
> segment: offset=1000 vaddr=401000
> 
> Did someone run SPEC recently? I made auto-FDO spec config and tested
> -Ofast with ipa-icf, ipa-cp-clone and ipa-sra disabled (to get rid of
> the clone merging).  I get sort of comparable results as w/o profile at
> all.  This is actually not _that_ bad start - it means that the data is
> probably not completely bogus, just not very useful :)
> (Without disabling ipa-cp, for example exchange regresses a lot since
> all profile info of the hot clone is lost).

We did test gcc AutoFDO with SPE profiling (using some local patches for 
AutoFDO tools) and BRBE for aarch64. I was comparing it against PGO. 
The cloning issues shows up with 648.exchange2_s.

Thanks,
Kugan
> 
> About the pre-ipa and post-ipa clone issues I think we may need to list
> names of clones that are created late we want to drop and keep it up to
> date, perhaps inventing clones.def file...
> 
> Honza
> 
> contrib/ChangeLog:
> 
>        * gen_autofdo_event.py: Add support for AMD Zen 3 and
>        later CPUs.
> 
> gcc/ChangeLog:
> 
>        * config/i386/gcc-auto-profile: regenerate.
> 
> diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
> index 4364e5ce072..b1d373f82fe 100755
> --- a/contrib/gen_autofdo_event.py
> +++ b/contrib/gen_autofdo_event.py
> @@ -138,8 +138,16 @@ if [ "$1" = "--all" ] ; then
>   shift
> fi
> 
> -if ! grep -q Intel /proc/cpuinfo ; then
> -  echo >&2 "Only Intel CPUs supported"
> +if grep -q AuthenticAMD /proc/cpuinfo ; then
> +  vendor=AMD
> +  if ! grep -q " brs" /proc/cpuinfo && ! grep -q amd_lbr_v2 /proc/cpuinfo ; 
> then
> +    echo >&2 "AMD CPU with brs (Zen 3) or amd_lbr_v2 (Zen 4+) feature is 
> required"
> +    exit 1
> +  fi
> +elif grep -q Intel /proc/cpuinfo ; then
> +  vendor=Intel
> +else
> +  echo >&2 "Only AMD and Intel CPUs supported"
>   exit 1
> fi
> 
> @@ -147,7 +155,7 @@ if grep -q hypervisor /proc/cpuinfo ; then
>   echo >&2 "Warning: branch profiling may not be functional in VMs"
> fi
> 
> -case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
> +case `test $vendor = Intel && grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
>   grep -E "^model\s*:" /proc/cpuinfo | head -n1` in''')
>     for event, mod in eventmap.items():
>         for m in mod[:-1]:
> @@ -156,8 +164,13 @@ case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
>     print(r'''*)
>         if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; 
> then
>             E=br_inst_retired.near_taken:p
> +        elif perf list ex_ret_brn_tkn | grep -q ex_ret_brn_tkn ; then
> +            E=ex_ret_brn_tkn:P$FLAGS
> +        elif $vendor = Intel ; then
> +echo >&2 "Unknown Intel CPU. Run contrib/gen_autofdo_event.py --all --script 
> to update script."
> +         exit 1
>         else
> -echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
> update script."
> +echo >&2 "AMD CPU without support for ex_ret_brn_tkn event"
>          exit 1
>         fi ;;''')
>     print(r"esac")
> diff --git a/gcc/config/i386/gcc-auto-profile 
> b/gcc/config/i386/gcc-auto-profile
> index 528b34e4240..0e9e5fec2fe 100755
> --- a/gcc/config/i386/gcc-auto-profile
> +++ b/gcc/config/i386/gcc-auto-profile
> @@ -24,8 +24,16 @@ if [ "$1" = "--all" ] ; then
>   shift
> fi
> 
> -if ! grep -q Intel /proc/cpuinfo ; then
> -  echo >&2 "Only Intel CPUs supported"
> +if grep -q AuthenticAMD /proc/cpuinfo ; then
> +  vendor=AMD
> +  if ! grep -q " brs" /proc/cpuinfo && ! grep -q amd_lbr_v2 /proc/cpuinfo ; 
> then
> +    echo >&2 "AMD CPU with brs (Zen 3) or amd_lbr_v2 (Zen 4+) feature is 
> required"
> +    exit 1
> +  fi
> +elif grep -q Intel /proc/cpuinfo ; then
> +  vendor=Intel
> +else
> +  echo >&2 "Only AMD and Intel CPUs supported"
>   exit 1
> fi
> 
> @@ -33,7 +41,7 @@ if grep -q hypervisor /proc/cpuinfo ; then
>   echo >&2 "Warning: branch profiling may not be functional in VMs"
> fi
> 
> -case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
> +case `test $vendor = Intel && grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
>   grep -E "^model\s*:" /proc/cpuinfo | head -n1` in
> model*:\ 46|\
> model*:\ 30|\
> @@ -82,6 +90,8 @@ model*:\ 126|\
> model*:\ 167|\
> model*:\ 140|\
> model*:\ 141|\
> +model*:\ 143|\
> +model*:\ 207|\
> model*:\ 106|\
> model*:\ 108|\
> model*:\ 173|\
> @@ -89,15 +99,20 @@ model*:\ 174) E="cpu/event=0xc4,umask=0x20/$FLAGS" ;;
> model*:\ 134|\
> model*:\ 150|\
> model*:\ 156) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
> -model*:\ 143|\
> -model*:\ 207) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;;
> -model*:\ 190) E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;;
> +model*:\ 190|\
> +model*:\ 175|\
> +model*:\ 182) E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;;
> model*:\ 190) E="cpu/event=0xc4,umask=0xfe/$FLAGS" ;;
> *)
>         if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; 
> then
>             E=br_inst_retired.near_taken:p
> +        elif perf list ex_ret_brn_tkn | grep -q ex_ret_brn_tkn ; then
> +            E=ex_ret_brn_tkn:P$FLAGS
> +        elif $vendor = Intel ; then
> +echo >&2 "Unknown Intel CPU. Run contrib/gen_autofdo_event.py --all --script 
> to update script."
> +         exit 1
>         else
> -echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
> update script."
> +echo >&2 "AMD CPU without support for ex_ret_brn_tkn event"
>          exit 1
>         fi ;;
> esac


Reply via email to