https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120
Bug ID: 120120 Summary: gcc-16: performance regression with -O3 compared to gcc-15 Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: manuel.lauss at googlemail dot com Target Milestone: --- Created attachment 61325 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61325&action=edit example code taking the perf hit at O3 On some code I use, I noticed a large performance regression in gcc-16, starting at around 21.04.2025. I've attached sample C code which according to perf takes almost all processing time. Happens with "-O3 -march=znver5 -mtune=znver5 -pipe", at -O2 both -15 and -16 are equally slow. Perf stats: gcc-15: Performance counter stats for './sanplay -2 RAM.SAN': 6,33 msec task-clock:u # 0,949 CPUs utilized 808 page-faults:u # 127,589 K/sec 85.738.923 instructions:u # 3,10 insn per cycle # 0,06 stalled cycles per insn 27.659.116 cycles:u # 4,368 GHz 4.788.925 stalled-cycles-frontend:u # 17,31% frontend cycles idle 8.000.727 branches:u # 1,263 G/sec 275.954 branch-misses:u # 3,45% of all branches gcc-16: Performance counter stats for './sanplay -2 /home/mano/games/Outlaws/RAM.SAN': 13,02 msec task-clock:u # 0,974 CPUs utilized 314.392.362 instructions:u # 4,97 insn per cycle # 0,02 stalled cycles per insn 63.277.723 cycles:u # 4,861 GHz 5.510.316 stalled-cycles-frontend:u # 8,71% frontend cycles idle 53.730.810 branches:u # 4,127 G/sec 305.375 branch-misses:u # 0,57% of all branches The amount of instructions executed is 3.6x higher; on a larger example file it's up to 4.5x instructions executed; this is not zen5 specific but happens on a haswell as well. At -O2 both gcc-15 and gcc-16 have identical performance. Full source is at https://github.com/mlauss2/sandec Demo file can be grabbed from https://samples.mplayerhq.hu/game-formats/la-san/outlaws/ram.san I'll do a bisection next. Thanks! Manuel