16 Regression] 6.5% slowdown of 548.exchange2_r on Intel Ice Lake

liuhongt at gcc dot gnu.org via Gcc-bugs Thu, 12 Jun 2025 17:38:29 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115842


--- Comment #11 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #9)
> (In reply to Hongtao Liu from comment #8)
> > (In reply to Tamar Christina from comment #7)
> > > (In reply to Hongtao Liu from comment #6)
> > > >  I noticed some double-counting of cost in group-candidate (regarding 
> > > > loop
> > > > invariant expressions), this modification reduces the number of 
> > > > instructions
> > > > executed by ~8% for exchange_r binary compiled with -march=x86-64-v3 
> > > > -O2.
> > > > 
> > > 
> > > Note that this patch causes regressions on AArch64.  While exchange 
> > > improves
> > > slightly I see regressions in: leela, -5%, mcf, xz, x264, deepsjeng -2%,
> > > geomean -1%
> > 
> > What options do you use, we have an AmpereOne machine, like to try to see if
> > it's reproduciable on it.
> 
> This was on Neoverse-V2, but probably reproducible on AmpereOne, the flags
> was -mcpu=native -Ofast -fomit-framepointer -flto=auto

I tested my patch against latest trunk, and use the same option, can't
reproduce those regression on AWS graviton4.

500.perlbench_r         0.10%
502.gcc_r               0.00%
505.mcf_r               0.00%
520.omnetpp_r           0.00%
523.xalancbmk_r         0.00%
525.x264_r              0.00%
531.deepsjeng_r         0.20%
541.leela_r             0.00%
548.exchange2_r         0.00%
557.xz_r                0.00%
503.bwaves_r            0.30%
507.cactuBSSN_r         0.00%
508.namd_r              0.30%
510.parest_r            0.20%
511.povray_r            -1.00% (code alignment issue)
519.lbm_r               0.10%
521.wrf_r               0.00%
526.blender_r           -0.20%
527.cam4_r              0.00%
538.imagick_r           0.00%
544.nab_r               0.10%
549.fotonik3d_r         -0.40%
554.roms_r              0.00%
Geomean-int             0.00%
Geomean-fp              0.00%
Geomean-all             0.00%


Architecture:                         aarch64                   
CPU op-mode(s):                       64-bit                    
Byte Order:                           Little Endian                     
CPU(s):                               96                        
On-line CPU(s) list:                  0-95                      
Vendor ID:                            ARM                       
Model name:                           Neoverse-V2                       
Model:                                1                 
Thread(s) per core:                   1                 
Core(s) per socket:                   96                        
Socket(s):                            1                 
Stepping:                             r0p1                      
BogoMIPS:                             2000.00                   
Flags:                                fp asimd evtstrm aes pmull sha1 sha2
crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp
sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes
svepmull svebitperm svesha3 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti  
L1d cache:                            6 MiB (96 instances)                      
L1i cache:                            6 MiB (96 instances)                      
L2 cache:                             192 MiB (96 instances)                    
L3 cache:                             36 MiB (1 instance)                       
NUMA node(s):                         1                 
NUMA node0 CPU(s):                    0-95                      
Vulnerability Gather data sampling:   Not affected                      
Vulnerability Itlb multihit:          Not affected                      
Vulnerability L1tf:                   Not affected                      
Vulnerability Mds:                    Not affected                      
Vulnerability Meltdown:               Not affected                      
Vulnerability Mmio stale data:        Not affected                      
Vulnerability Reg file data sampling: Not affected                      
Vulnerability Retbleed:               Not affected                      
Vulnerability Spec rstack overflow:   Not affected                      
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass
disabled via prctl                   
Vulnerability Spectre v1:             Mitigation; __user pointer sanitization   
Vulnerability Spectre v2:             Not affected                      
Vulnerability Srbds:                  Not affected                      
Vulnerability Tsx async abort:        Not affected

[Bug target/115842] [15/16 Regression] 6.5% slowdown of 548.exchange2_r on Intel Ice Lake

Reply via email to