https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89853

            Bug ID: 89853
           Summary: Regression of 525.x264_r at -O2 (and generic tuning)
                    on AMD EPYC
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
  Target Milestone: ---

I have detected a 7% regression of 525.x264_r from SPEC INTrate 2017
at -O2 and generic march/tuning on AMD EPYC (znver1) CPUs (I have not seen
it on an Intel CPU), compared to the gcc-8-branch.

I have bisected it to r264897.

With revision 264896 I get:

  perf stat:

    Performance counter stats for 'numactl -C 0 -l specinvoke':

        495413.105450      task-clock:u (msec)       #    0.999 CPUs utilized   
                    0      context-switches:u        #    0.000 K/sec           
                    0      cpu-migrations:u          #    0.000 K/sec           
                80572      page-faults:u             #    0.163 K/sec           
        1573525941814      cycles:u                  #    3.176 GHz            
         (83.33%)
          56730573392      stalled-cycles-frontend:u #    3.61% frontend cycles
idle     (83.33%)
         397644125819      stalled-cycles-backend:u  #   25.27% backend cycles
idle      (83.33%)
        5157395976259      instructions:u            #    3.28  insn per cycle  
                                                     #    0.08  stalled cycles
per insn  (83.33%)
         421019689027      branches:u                #  849.836 M/sec          
         (83.33%)
          10705813341      branch-misses:u           #    2.54% of all branches
         (83.33%)

        495.869208013 seconds time elapsed


  perf report -n --percent-limit 2

   # Event count (approx.): 1576108148398
   #
   # Overhead    Samples  Command      Shared Object   Symbol                   
   # ........  .........  ...........  .............. 
............................
   #
       14.20%     282290  x264_r_base  x264_r_base.mi  [.] x264_pixel_satd_8x4
       11.19%     222403  x264_r_base  x264_r_base.mi  [.] get_ref
       10.82%     215061  x264_r_base  x264_r_base.mi  [.]
x264_pixel_sad_x4_16x16
        7.00%     139082  x264_r_base  x264_r_base.mi  [.] x264_pixel_sad_16x16
        6.11%     121470  x264_r_base  x264_r_base.mi  [.]
x264_pixel_sad_x3_16x16
        5.89%     116939  x264_r_base  x264_r_base.mi  [.]
x264_pixel_sad_x4_8x8
        5.09%     101266  x264_r_base  x264_r_base.mi  [.] quant_4x4
        4.10%      81471  x264_r_base  x264_r_base.mi  [.] mc_chroma
        2.47%      49122  x264_r_base  x264_r_base.mi  [.]
x264_pixel_sad_x3_8x8
        2.21%      43928  x264_r_base  x264_r_base.mi  [.] sub4x4_dct
        2.14%      42598  x264_r_base  x264_r_base.mi  [.] pixel_hadamard_ac

With revision 264897 I get:

  perf stat

    Performance counter stats for 'numactl -C 0 -l specinvoke':

        495413.105450      task-clock:u (msec)       #    0.999 CPUs utilized   
                    0      context-switches:u        #    0.000 K/sec           
                    0      cpu-migrations:u          #    0.000 K/sec           
                80572      page-faults:u             #    0.163 K/sec           
        1573525941814      cycles:u                  #    3.176 GHz            
         (83.33%)
          56730573392      stalled-cycles-frontend:u #    3.61% frontend cycles
idle     (83.33%)
         397644125819      stalled-cycles-backend:u  #   25.27% backend cycles
idle      (83.33%)
        5157395976259      instructions:u            #    3.28  insn per cycle  
                                                     #    0.08  stalled cycles
per insn  (83.33%)
         421019689027      branches:u                #  849.836 M/sec          
         (83.33%)
          10705813341      branch-misses:u           #    2.54% of all branches
         (83.33%)

        495.869208013 seconds time elapsed


  perf report -n --percent-limit 2

   # Event count (approx.): 1576108148398
   #
   # Overhead       Samples  Command          Shared Object                
Symbol                                           
   # ........  ............  ...............  ............................ 
.................................................
   #
       14.20%        282290  x264_r_base.min  x264_r_base.mine-gen-std-m64  [.]
x264_pixel_satd_8x4
       11.19%        222403  x264_r_base.min  x264_r_base.mine-gen-std-m64  [.]
get_ref
       10.82%        215061  x264_r_base.min  x264_r_base.mine-gen-std-m64  [.]
x264_pixel_sad_x4_16x16
        7.00%        139082  x264_r_base.min  x264_r_base.mine-gen-std-m64  [.]
x264_pixel_sad_16x16
        6.11%        121470  x264_r_base.min  x264_r_base.mine-gen-std-m64  [.]
x264_pixel_sad_x3_16x16
        5.89%        116939  x264_r_base.min  x264_r_base.mine-gen-std-m64  [.]
x264_pixel_sad_x4_8x8
        5.09%        101266  x264_r_base.min  x264_r_base.mine-gen-std-m64  [.]
quant_4x4
        4.10%         81471  x264_r_base.min  x264_r_base.mine-gen-std-m64  [.]
mc_chroma
        2.47%         49122  x264_r_base.min  x264_r_base.mine-gen-std-m64  [.]
x264_pixel_sad_x3_8x8
        2.21%         43928  x264_r_base.min  x264_r_base.mine-gen-std-m64  [.]
sub4x4_dct
        2.14%         42598  x264_r_base.min  x264_r_base.mine-gen-std-m64  [.]
pixel_hadamard_ac

Reply via email to