https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115842
--- Comment #11 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to Tamar Christina from comment #9) > (In reply to Hongtao Liu from comment #8) > > (In reply to Tamar Christina from comment #7) > > > (In reply to Hongtao Liu from comment #6) > > > > I noticed some double-counting of cost in group-candidate (regarding > > > > loop > > > > invariant expressions), this modification reduces the number of > > > > instructions > > > > executed by ~8% for exchange_r binary compiled with -march=x86-64-v3 > > > > -O2. > > > > > > > > > > Note that this patch causes regressions on AArch64. While exchange > > > improves > > > slightly I see regressions in: leela, -5%, mcf, xz, x264, deepsjeng -2%, > > > geomean -1% > > > > What options do you use, we have an AmpereOne machine, like to try to see if > > it's reproduciable on it. > > This was on Neoverse-V2, but probably reproducible on AmpereOne, the flags > was -mcpu=native -Ofast -fomit-framepointer -flto=auto I tested my patch against latest trunk, and use the same option, can't reproduce those regression on AWS graviton4. 500.perlbench_r 0.10% 502.gcc_r 0.00% 505.mcf_r 0.00% 520.omnetpp_r 0.00% 523.xalancbmk_r 0.00% 525.x264_r 0.00% 531.deepsjeng_r 0.20% 541.leela_r 0.00% 548.exchange2_r 0.00% 557.xz_r 0.00% 503.bwaves_r 0.30% 507.cactuBSSN_r 0.00% 508.namd_r 0.30% 510.parest_r 0.20% 511.povray_r -1.00% (code alignment issue) 519.lbm_r 0.10% 521.wrf_r 0.00% 526.blender_r -0.20% 527.cam4_r 0.00% 538.imagick_r 0.00% 544.nab_r 0.10% 549.fotonik3d_r -0.40% 554.roms_r 0.00% Geomean-int 0.00% Geomean-fp 0.00% Geomean-all 0.00% Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per socket: 96 Socket(s): 1 Stepping: r0p1 BogoMIPS: 2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 6 MiB (96 instances) L1i cache: 6 MiB (96 instances) L2 cache: 192 MiB (96 instances) L3 cache: 36 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-95 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected