https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114989
Bug ID: 114989 Summary: Compile time hog when building paml Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: sjames at gcc dot gnu.org Target Milestone: --- Created attachment 58130 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58130&action=edit baseml.i Noticed this while working on something unrelated, just felt it looked a bit sluggish. The source file is a bit complex, but it's also _not_ huge generated gunk, which I think makes this a bit more interesting. I don't have older non-checking compilers around to see if it's a regression, sorry. ``` # time gcc-13 baseml.i -c -O2 real 0m3.877s user 0m3.755s sys 0m0.087s ``` ``` # gcc-13 baseml.i -c -O2 -ftime-report Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1819k ( 1%) phase parsing : 0.09 ( 2%) 0.21 ( 34%) 0.31 ( 7%) 9507k ( 6%) phase opt and generate : 3.59 ( 98%) 0.40 ( 66%) 4.01 ( 93%) 138M ( 93%) dump files : 0.04 ( 1%) 0.01 ( 2%) 0.05 ( 1%) 0 ( 0%) callgraph construction : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 3240k ( 2%) callgraph optimization : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 2592 ( 0%) callgraph functions expansion : 3.16 ( 86%) 0.32 ( 52%) 3.50 ( 81%) 111M ( 75%) callgraph ipa passes : 0.39 ( 11%) 0.07 ( 11%) 0.46 ( 11%) 14M ( 10%) ipa function summary : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 0%) 1174k ( 1%) ipa inlining heuristics : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 54k ( 0%) ipa pure const : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 72k ( 0%) ipa free inline summary : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) cfg construction : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 580k ( 0%) cfg cleanup : 0.08 ( 2%) 0.00 ( 0%) 0.08 ( 2%) 703k ( 0%) trivially dead code : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 0%) 0 ( 0%) df reaching defs : 0.03 ( 1%) 0.02 ( 3%) 0.03 ( 1%) 0 ( 0%) df live regs : 0.13 ( 4%) 0.01 ( 2%) 0.14 ( 3%) 0 ( 0%) df live&initialized regs : 0.02 ( 1%) 0.01 ( 2%) 0.06 ( 1%) 0 ( 0%) df use-def / def-use chains : 0.04 ( 1%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) df reg dead/unused notes : 0.08 ( 2%) 0.00 ( 0%) 0.05 ( 1%) 1648k ( 1%) register information : 0.03 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) alias analysis : 0.02 ( 1%) 0.01 ( 2%) 0.05 ( 1%) 4296k ( 3%) alias stmt walking : 0.10 ( 3%) 0.00 ( 0%) 0.15 ( 3%) 658k ( 0%) register scan : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 24k ( 0%) rebuild jump labels : 0.02 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) preprocessing : 0.02 ( 1%) 0.06 ( 10%) 0.11 ( 3%) 331k ( 0%) lexical analysis : 0.03 ( 1%) 0.08 ( 13%) 0.13 ( 3%) 0 ( 0%) parser (global) : 0.01 ( 0%) 0.01 ( 2%) 0.03 ( 1%) 2449k ( 2%) parser function body : 0.03 ( 1%) 0.06 ( 10%) 0.04 ( 1%) 6620k ( 4%) early inlining heuristics : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 41k ( 0%) inline parameters : 0.02 ( 1%) 0.00 ( 0%) 0.01 ( 0%) 328k ( 0%) tree gimplify : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 0%) 7063k ( 5%) tree eh : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 29k ( 0%) tree CFG construction : 0.01 ( 0%) 0.01 ( 2%) 0.02 ( 0%) 2159k ( 1%) tree CFG cleanup : 0.03 ( 1%) 0.00 ( 0%) 0.04 ( 1%) 97k ( 0%) tree tail merge : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 980k ( 1%) tree VRP : 0.15 ( 4%) 0.00 ( 0%) 0.17 ( 4%) 4202k ( 3%) tree Early VRP : 0.09 ( 2%) 0.01 ( 2%) 0.11 ( 3%) 1768k ( 1%) tree copy propagation : 0.03 ( 1%) 0.00 ( 0%) 0.01 ( 0%) 108k ( 0%) tree PTA : 0.13 ( 4%) 0.01 ( 2%) 0.04 ( 1%) 665k ( 0%) tree SSA rewrite : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 2202k ( 1%) tree SSA incremental : 0.02 ( 1%) 0.01 ( 2%) 0.00 ( 0%) 745k ( 0%) tree operand scan : 0.01 ( 0%) 0.02 ( 3%) 0.03 ( 1%) 3714k ( 2%) dominator optimization : 0.21 ( 6%) 0.02 ( 3%) 0.24 ( 6%) 3760k ( 2%) backwards jump threading : 0.09 ( 2%) 0.01 ( 2%) 0.14 ( 3%) 837k ( 1%) isolate eroneous paths : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 856 ( 0%) tree CCP : 0.12 ( 3%) 0.01 ( 2%) 0.13 ( 3%) 924k ( 1%) tree reassociation : 0.02 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 46k ( 0%) tree PRE : 0.10 ( 3%) 0.01 ( 2%) 0.14 ( 3%) 3693k ( 2%) tree FRE : 0.15 ( 4%) 0.04 ( 7%) 0.16 ( 4%) 2644k ( 2%) tree RPO VN : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 304k ( 0%) tree code sinking : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1562k ( 1%) tree forward propagate : 0.00 ( 0%) 0.01 ( 2%) 0.08 ( 2%) 336k ( 0%) tree conservative DCE : 0.03 ( 1%) 0.01 ( 2%) 0.04 ( 1%) 43k ( 0%) tree aggressive DCE : 0.01 ( 0%) 0.01 ( 2%) 0.04 ( 1%) 682k ( 0%) tree buildin call DCE : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 23k ( 0%) tree DSE : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 204k ( 0%) PHI merge : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 152k ( 0%) tree loop invariant motion : 0.01 ( 0%) 0.01 ( 2%) 0.01 ( 0%) 86k ( 0%) tree canonical iv : 0.00 ( 0%) 0.00 ( 0%) 0.03 ( 1%) 1120k ( 1%) scev constant prop : 0.01 ( 0%) 0.01 ( 2%) 0.00 ( 0%) 222k ( 0%) complete unrolling : 0.03 ( 1%) 0.00 ( 0%) 0.08 ( 2%) 3788k ( 2%) tree vectorization : 0.02 ( 1%) 0.01 ( 2%) 0.02 ( 0%) 1974k ( 1%) tree slp vectorization : 0.02 ( 1%) 0.03 ( 5%) 0.03 ( 1%) 9543k ( 6%) tree loop distribution : 0.01 ( 0%) 0.01 ( 2%) 0.02 ( 0%) 981k ( 1%) tree iv optimization : 0.05 ( 1%) 0.01 ( 2%) 0.05 ( 1%) 7110k ( 5%) predictive commoning : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 799k ( 1%) tree SSA uncprop : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 8640 ( 0%) tree strlen optimization : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 0%) 477k ( 0%) tree modref : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 272k ( 0%) dominance frontiers : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) dominance computation : 0.03 ( 1%) 0.00 ( 0%) 0.03 ( 1%) 0 ( 0%) control dependences : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) out of ssa : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 44k ( 0%) expand vars : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1394k ( 1%) expand : 0.03 ( 1%) 0.01 ( 2%) 0.01 ( 0%) 10M ( 7%) forward prop : 0.11 ( 3%) 0.00 ( 0%) 0.06 ( 1%) 182k ( 0%) CSE : 0.06 ( 2%) 0.00 ( 0%) 0.08 ( 2%) 406k ( 0%) dead code elimination : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 ( 0%) dead store elim1 : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 823k ( 1%) dead store elim2 : 0.02 ( 1%) 0.00 ( 0%) 0.01 ( 0%) 1231k ( 1%) loop init : 0.02 ( 1%) 0.00 ( 0%) 0.04 ( 1%) 10M ( 7%) loop invariant motion : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 119k ( 0%) loop unrolling : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 769k ( 1%) loop fini : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) CPROP : 0.05 ( 1%) 0.01 ( 2%) 0.06 ( 1%) 1971k ( 1%) PRE : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 362k ( 0%) CSE 2 : 0.03 ( 1%) 0.00 ( 0%) 0.07 ( 2%) 153k ( 0%) branch prediction : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1254k ( 1%) combiner : 0.08 ( 2%) 0.02 ( 3%) 0.10 ( 2%) 4302k ( 3%) if-conversion : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 470k ( 0%) integrated RA : 0.27 ( 7%) 0.00 ( 0%) 0.22 ( 5%) 13M ( 9%) LRA non-specific : 0.08 ( 2%) 0.00 ( 0%) 0.05 ( 1%) 1171k ( 1%) LRA virtuals elimination : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 500k ( 0%) LRA reload inheritance : 0.02 ( 1%) 0.00 ( 0%) 0.03 ( 1%) 186k ( 0%) LRA create live ranges : 0.03 ( 1%) 0.00 ( 0%) 0.05 ( 1%) 152k ( 0%) LRA hard reg assignment : 0.02 ( 1%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) LRA rematerialization : 0.01 ( 0%) 0.01 ( 2%) 0.02 ( 0%) 288 ( 0%) reload CSE regs : 0.04 ( 1%) 0.00 ( 0%) 0.10 ( 2%) 1630k ( 1%) ree : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 90k ( 0%) thread pro- & epilogue : 0.02 ( 1%) 0.00 ( 0%) 0.03 ( 1%) 584k ( 0%) combine stack adjustments : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) peephole 2 : 0.04 ( 1%) 0.00 ( 0%) 0.01 ( 0%) 425k ( 0%) hard reg cprop : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 0%) 6720 ( 0%) scheduling 2 : 0.16 ( 4%) 0.00 ( 0%) 0.19 ( 4%) 648k ( 0%) machine dep reorg : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 464 ( 0%) reorder blocks : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 727k ( 0%) shorten branches : 0.02 ( 1%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) final : 0.03 ( 1%) 0.00 ( 0%) 0.01 ( 0%) 2345k ( 2%) tree if-combine : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 40k ( 0%) address lowering : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 20k ( 0%) tree loop if-conversion : 0.04 ( 1%) 0.00 ( 0%) 0.02 ( 0%) 793k ( 1%) access analysis : 0.02 ( 1%) 0.01 ( 2%) 0.02 ( 0%) 321k ( 0%) rest of compilation : 0.07 ( 2%) 0.02 ( 3%) 0.08 ( 2%) 1036k ( 1%) remove unused locals : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) address taken : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) repair loop structures : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 4056 ( 0%) TOTAL : 3.68 0.61 4.32 149M ``` * gcc-13 -O0 takes 0.6s * gcc-13 -O1 takes 2.2s * clang 18.1.5 -O2 takes 0.3s. ``` Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/13/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-13.2.1_p20240210/work/gcc-13-20240210/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/13 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/13/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/13 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/13/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/13/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13 --disable-silent-rules --disable-dependency-tracking --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/13/python --enable-objc-gc --enable-languages=c,c++,objc,obj-c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --disable-libunwind-exceptions --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 13.2.1_p20240210 p14' --with-gcc-major-version-only --enable-libstdcxx-time --enable-lto --disable-libstdcxx-pch --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-fixed-point --enable-targets=all --enable-libgomp --disable-libssp --disable-libada --disable-cet --disable-systemtap --disable-valgrind-annotations --disable-vtable-verify --disable-libvtv --without-zstd --with-isl --disable-isl-version-check --enable-default-pie --enable-default-ssp --disable-fixincludes Thread model: posix Supported LTO compression algorithms: zlib gcc version 13.2.1 20240210 (Gentoo 13.2.1_p20240210 p14) ```