https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89853
Bug ID: 89853 Summary: Regression of 525.x264_r at -O2 (and generic tuning) on AMD EPYC Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org Target Milestone: --- I have detected a 7% regression of 525.x264_r from SPEC INTrate 2017 at -O2 and generic march/tuning on AMD EPYC (znver1) CPUs (I have not seen it on an Intel CPU), compared to the gcc-8-branch. I have bisected it to r264897. With revision 264896 I get: perf stat: Performance counter stats for 'numactl -C 0 -l specinvoke': 495413.105450 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 80572 page-faults:u # 0.163 K/sec 1573525941814 cycles:u # 3.176 GHz (83.33%) 56730573392 stalled-cycles-frontend:u # 3.61% frontend cycles idle (83.33%) 397644125819 stalled-cycles-backend:u # 25.27% backend cycles idle (83.33%) 5157395976259 instructions:u # 3.28 insn per cycle # 0.08 stalled cycles per insn (83.33%) 421019689027 branches:u # 849.836 M/sec (83.33%) 10705813341 branch-misses:u # 2.54% of all branches (83.33%) 495.869208013 seconds time elapsed perf report -n --percent-limit 2 # Event count (approx.): 1576108148398 # # Overhead Samples Command Shared Object Symbol # ........ ......... ........... .............. ............................ # 14.20% 282290 x264_r_base x264_r_base.mi [.] x264_pixel_satd_8x4 11.19% 222403 x264_r_base x264_r_base.mi [.] get_ref 10.82% 215061 x264_r_base x264_r_base.mi [.] x264_pixel_sad_x4_16x16 7.00% 139082 x264_r_base x264_r_base.mi [.] x264_pixel_sad_16x16 6.11% 121470 x264_r_base x264_r_base.mi [.] x264_pixel_sad_x3_16x16 5.89% 116939 x264_r_base x264_r_base.mi [.] x264_pixel_sad_x4_8x8 5.09% 101266 x264_r_base x264_r_base.mi [.] quant_4x4 4.10% 81471 x264_r_base x264_r_base.mi [.] mc_chroma 2.47% 49122 x264_r_base x264_r_base.mi [.] x264_pixel_sad_x3_8x8 2.21% 43928 x264_r_base x264_r_base.mi [.] sub4x4_dct 2.14% 42598 x264_r_base x264_r_base.mi [.] pixel_hadamard_ac With revision 264897 I get: perf stat Performance counter stats for 'numactl -C 0 -l specinvoke': 495413.105450 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 80572 page-faults:u # 0.163 K/sec 1573525941814 cycles:u # 3.176 GHz (83.33%) 56730573392 stalled-cycles-frontend:u # 3.61% frontend cycles idle (83.33%) 397644125819 stalled-cycles-backend:u # 25.27% backend cycles idle (83.33%) 5157395976259 instructions:u # 3.28 insn per cycle # 0.08 stalled cycles per insn (83.33%) 421019689027 branches:u # 849.836 M/sec (83.33%) 10705813341 branch-misses:u # 2.54% of all branches (83.33%) 495.869208013 seconds time elapsed perf report -n --percent-limit 2 # Event count (approx.): 1576108148398 # # Overhead Samples Command Shared Object Symbol # ........ ............ ............... ............................ ................................................. # 14.20% 282290 x264_r_base.min x264_r_base.mine-gen-std-m64 [.] x264_pixel_satd_8x4 11.19% 222403 x264_r_base.min x264_r_base.mine-gen-std-m64 [.] get_ref 10.82% 215061 x264_r_base.min x264_r_base.mine-gen-std-m64 [.] x264_pixel_sad_x4_16x16 7.00% 139082 x264_r_base.min x264_r_base.mine-gen-std-m64 [.] x264_pixel_sad_16x16 6.11% 121470 x264_r_base.min x264_r_base.mine-gen-std-m64 [.] x264_pixel_sad_x3_16x16 5.89% 116939 x264_r_base.min x264_r_base.mine-gen-std-m64 [.] x264_pixel_sad_x4_8x8 5.09% 101266 x264_r_base.min x264_r_base.mine-gen-std-m64 [.] quant_4x4 4.10% 81471 x264_r_base.min x264_r_base.mine-gen-std-m64 [.] mc_chroma 2.47% 49122 x264_r_base.min x264_r_base.mine-gen-std-m64 [.] x264_pixel_sad_x3_8x8 2.21% 43928 x264_r_base.min x264_r_base.mine-gen-std-m64 [.] sub4x4_dct 2.14% 42598 x264_r_base.min x264_r_base.mine-gen-std-m64 [.] pixel_hadamard_ac