https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116062

            Bug ID: 116062
           Summary: Exponentially slow compilation at -O3 with
                    __attribute__((flatten))
           Product: gcc
           Version: 14.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: valentin at tolmer dot fr
  Target Milestone: ---

After much reducing, I got to this example (as you can imagine, it comes from
very templated code):

$ cat example.cpp
#include <set>
using T = std::set<std::set<std::set<int>>>;
using TSeq = std::set<T>;
struct Inserter {
  TSeq& out;
  void operator()(const T& t) const __attribute__((flatten)) { out.insert(t); }
};
struct Forwarder {
  Inserter& consumer;
  void operator()(const T& t) const __attribute__((flatten)) { consumer(t); }
};
struct Container {
  void iterate(const Forwarder&) const __attribute__((flatten));
  TSeq sequence;
};
void Container::iterate(const Forwarder& consumer) const {
  return [&]() __attribute__((flatten)) {
    for (const T& elem : sequence) {
      consumer(elem);
    }
  }
  ();
}
$ time g++-14.1.0 -std=c++20 -O3 example.cpp -o example.o -c
46.07s user 0.44s system 99% cpu 46.518 total

It scales up _very_ fast with the number of nested std::set in T: 2 nested sets
is 2.8s, 3 is 46s, 4 is over 6min35. It's very sensitive, so inlining the
lambda inside iterate brings the compilation speed to <1s, removing pretty much
any flatten makes it go very fast as well. Interestingly, even inlining the
definition of iterate into the class makes the bug go away.

The size of the output scale similarly, going from 77K to 374K to 1.5M when
adding nested std::set.

Interestingly, running with -O2 -fgcse-after-reload -fipa-cp-clone
-floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning
-fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre
-funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides instead
of -O3 (which should be equivalent) also compiles very fast. I tried to compare
the output of -Q --help=optimizers but there's no diff, so I was unable to
pinpoint the optimization pass responsible.

It's not a new thing, it also compiled very slowly with gcc 4.9.4 (I
spot-checked a few in between), obviously with -std=c++11. I didn't go any
further back because it started to require meaningful changes to the code.

Reply via email to