https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116062
Bug ID: 116062 Summary: Exponentially slow compilation at -O3 with __attribute__((flatten)) Product: gcc Version: 14.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: valentin at tolmer dot fr Target Milestone: --- After much reducing, I got to this example (as you can imagine, it comes from very templated code): $ cat example.cpp #include <set> using T = std::set<std::set<std::set<int>>>; using TSeq = std::set<T>; struct Inserter { TSeq& out; void operator()(const T& t) const __attribute__((flatten)) { out.insert(t); } }; struct Forwarder { Inserter& consumer; void operator()(const T& t) const __attribute__((flatten)) { consumer(t); } }; struct Container { void iterate(const Forwarder&) const __attribute__((flatten)); TSeq sequence; }; void Container::iterate(const Forwarder& consumer) const { return [&]() __attribute__((flatten)) { for (const T& elem : sequence) { consumer(elem); } } (); } $ time g++-14.1.0 -std=c++20 -O3 example.cpp -o example.o -c 46.07s user 0.44s system 99% cpu 46.518 total It scales up _very_ fast with the number of nested std::set in T: 2 nested sets is 2.8s, 3 is 46s, 4 is over 6min35. It's very sensitive, so inlining the lambda inside iterate brings the compilation speed to <1s, removing pretty much any flatten makes it go very fast as well. Interestingly, even inlining the definition of iterate into the class makes the bug go away. The size of the output scale similarly, going from 77K to 374K to 1.5M when adding nested std::set. Interestingly, running with -O2 -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides instead of -O3 (which should be equivalent) also compiles very fast. I tried to compare the output of -Q --help=optimizers but there's no diff, so I was unable to pinpoint the optimization pass responsible. It's not a new thing, it also compiled very slowly with gcc 4.9.4 (I spot-checked a few in between), obviously with -std=c++11. I didn't go any further back because it started to require meaningful changes to the code.