https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531
--- Comment #14 from Jan Hubicka <hubicka at ucw dot cz> --- As for bit of history on this. I have introduced the split -O2 and -O3 limits in order to be able to enable -finline-small-functions at -O2 which we found to be really importnat for C++ codebases which no longer care about explicit use of inline keyword much. To do that it was necessary to find settings that does not grow -O2 binaries significantly (or reduce it) and yields to measurably better performance. Without LTO and SPECCPU the differences were quite small. With LTO it was more noticeable and with firefox/clang and similar with LTO they were significant (often double-digit). Pushing up -O2 limits can make sense, but needs to be done carefully - in longer term IMO we do not want to let -O2 binaries to grow faster than their perofrmance. Sadly this figure is not that great. https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branchhttps://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch loads slowly but has some data. SPEC2k17 with -O2 -flto on 2nd generation zen performs as follows: gcc-7 gcc-8 gcc-9 gcc-10 gcc-11 gcc-12 gcc-13 gcc-14 gcc-trunk SPECint 2.55% 2.90% ~ 4.55% 4.47% 11.29% 12.60% 14.13% 13.42% SPECfp ~ ~ ~ ~ ~ 4.15% 4.98% 5.30% 5.18% Those are scores (bigger is better) compared to gcc-6 in percents. ~ is noise. Large improvement in gcc-12 is enablement of vectorizer for specint comes primarily from x264 While text section size: gcc-7 gcc-8 gcc-9 gcc-10 gcc-11 gcc-12 gcc-13 gcc-14 gcc-trunk int ~ ~ ~ 9.77% 9.57% 8.72% 8.26% 10.68% 10.59% fp ~ 2.40% ~ 18.30% 18.24% 18.92% 18.66% 22.23% 22.27% Those are sizes (smaller is better). So we do get coniderable bloat. In GCC10 Fortran ABI changed and imporant part of FP 18% FP bloat is caused by it. Here are individual changes: runtime (only benchmarks with off-noise changes): Test Name gcc-7 gcc-8 gcc-9 gcc-10 gcc-11 gcc-12 gcc-13 gcc-14 gcc-trunk FP/538.imagick 25.01% 25.64% 27.57% 21.51% 21.75% 19.46% 19.88% 23.20% 22.91% INT/525.x264_r 7.25% 6.20% 6.58% 7.48% ~ -37.7% -40.4% -41.6% -39.90% INT/548.exchan -17.9% -17.8% -14.9% -14.1% -5.88% -13.9% -21.6% -25.0% -26.48% INT/531.deepsj -2.46% ~ ~ -15.0% -16.1% -17.9% -18.8% -19.3% -19.62% FP/503.bwaves_ -6.30% ~ -2.71% 16.95% 16.71% 16.65% 16.94% 16.94% 16.70% FP/527.cam4_r -2.99% -2.33% -10.7% -11.3% -10.9% -11.8% -11.9% -12.5% -11.37% FP/521.wrf_r ~ -2.40% -5.99% -6.10% -5.66% -9.45% -9.28% -9.82% -9.95% FP/554.roms_r ~ 5.79% 2.51% ~ 5.24% 7.95% 9.35% 9.11% 9.68% INT/520.omnetp -3.26% -3.45% ~ -3.82% -6.71% -7.37% -6.57% -6.83% -5.62% FP/549.fotonik ~ ~ -5.60% -8.26% -8.61% -3.80% -4.82% -3.26% -5.48% INT/541.leela_ -2.47% -2.19% ~ -4.57% -6.32% -4.76% -5.69% -6.72% -5.88% INT/500.perlbe ~ -2.11% -2.34% -6.03% -4.51% ~ ~ -5.01% -4.52% INT/523.xalanc -2.42% -3.18% -2.26% -3.75% -2.31% -5.95% -2.02% -3.52% ~ FP/511.povray_ ~ ~ 5.21% -6.54% ~ ~ ~ ~ ~ INT/505.mcf_r ~ ~ ~ ~ ~ -2.82% -3.32% -3.71% -4.14% FP/510.parest_ ~ ~ ~ ~ -3.31% ~ -2.28% -3.03% -3.39% FP/519.lbm_r 3.33% ~ ~ -4.72% ~ ~ ~ ~ ~ FP/544.nab_r ~ ~ ~ ~ ~ ~ -2.43% ~ -3.15% FP/508.namd_r ~ ~ ~ ~ 4.20% ~ ~ -2.35% -2.02% Those are times (smaller is better) - Imagemagick regression since GCC 7 is store-to-load forwarding where we vectorize load in one function of value stored by pieces in another. - x264 improvement in GCC 12 is vectorization at -O2 (which may be argued to help primarily code that should be built with -Ofast/-O3 anyway) - exchange improvement in GCC 7 is special handling of self recursive functions with nested loops (quite specific to the benchmark) - forgot what caused changes in deepsjeng in GCC10 and cam4 in GCC9 size GCC 6 size gcc-7 gcc-8 gcc-9 gcc-10 gcc-11 gcc-12 gcc-13 gcc-14 gcc-trunk FP/521.wrf_rg 11.85 MB ~ 5.78% 4.43% 33.11% 33.11% 34.41% 34.41% 38.42% 38.41% INT/557.xz_rg 75.53 KB ~ ~ ~ 30.10% 29.47% 29.18% 30.30% 33.28% 33.57% FP/totalg 28.08 MB ~ 2.40% ~ 18.30% 18.24% 18.92% 18.66% 22.23% 22.27% INT/523.xalanc 1.98 MB ~ ~ 15.05% 14.85% 14.54% 13.62% 13.80% 17.31% 17.07% FP/526.blender 6.21 MB ~ ~ -2.50% 15.93% 15.97% 15.70% 14.08% 18.47% 18.40% INT/541.leela 74.37 KB ~ ~ 13.36% -8.84% -8.54% -15.7% -15.3% -9.58% -10.34% INT/500.perlb 1.50 MB ~ ~ ~ 9.20% 9.08% 10.08% 9.69% 12.44% 12.38% INT/502.gcc_r 6.16 MB ~ ~ -2.18% 10.40% 10.59% 8.50% 8.07% 10.14% 10.10% FP/549.fotoni 325.23 KB ~ ~ ~ 4.39% 4.33% 8.35% 9.28% 11.76% 10.82% FP/519.lbm_rg 10.53 KB -3.90% -5.54% -4.72% -3.83% -3.60% -6.58% -6.43% -5.28% -5.28% FP/538.imagic 1.03 MB ~ 2.32% ~ 7.36% 7.47% 6.49% 6.22% 5.17% 4.74% FP/544.nab_rg 83.99 KB ~ -2.37% -3.63% -5.02% -5.43% -5.33% -7.50% -5.35% -5.49% INT/531.deeps 60.41 KB ~ ~ ~ 2.54% 2.81% 7.06% 6.65% 9.91% 10.01% FP/511.povray 771.09 KB ~ ~ ~ 7.83% 9.25% 6.44% 2.73% 5.84% 5.68% FP/507.cactuB 2.54 MB 6.59% ~ -3.85% ~ ~ 2.81% 5.32% 7.56% 9.22% FP/527.cam4_r 2.60 MB ~ 2.25% ~ 4.21% 3.92% 3.96% 5.02% 6.37% 6.11% INT/548.excha 65.35 KB -7.62% 2.14% ~ -3.24% -3.58% ~ ~ 5.92% 6.14% FP/510.parest 1.29 MB -2.11% ~ 9.79% ~ -2.17% -3.89% -4.43% 3.44% 3.22% INT/520.omnet 1.07 MB ~ ~ -3.96% 4.31% ~ 4.71% 2.48% 3.83% 3.52% FP/508.namd_r 829.33 KB ~ ~ 13.51% ~ ~ ~ ~ 4.11% 3.25% INT/505.mcf_r 12.59 KB ~ -3.25% -5.23% -2.24% -4.12% -2.88% -2.26% 2.71% ~ INT/525.x264_ 404.39 KB ~ ~ ~ -5.23% -5.15% -3.84% -3.88% ~ ~ FP/554.roms_r 563.96 KB -2.55% ~ ~ ~ ~ ~ -4.60% -4.17% -4.59% FP/503.bwaves 30.62 KB 2.74% ~ -2.24% -2.52% -2.43% ~ ~ ~ ~ So GCC binary for example got 10% bigger