https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253
Bug ID: 102253
Summary: scalability issues with large loop depth
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
When investigating an improvement to LIMs fill_always_executed_in I created
the following testcase which creates a loop nest of depth N with conditionally
executed subloops.
extern void foobar (int);
template <int a>
struct bar
{
static void baz(int b, int)
{
if (b & (1 << (a % 32)))
for (int i = 0; i < 1024; ++i)
bar<a-1>::baz (b, i);
}
};
template <>
struct bar<0>
{
static void baz (int, int i) { foobar (i); }
};
void __attribute__((flatten)) foo(int b)
{
#ifndef N
#define N 10
#endif
bar<N>::baz (b, 0);
}
For N == 900 (the maximum unless you also specify -ftemplate-depth) and -O1 we
see
tree canonical iv : 1.42 ( 13%) 0.00 ( 0%) 1.42 ( 13%)
28M ( 13%)
complete unrolling : 2.80 ( 27%) 0.00 ( 0%) 2.81 ( 26%)
42M ( 19%)
integrated RA : 3.41 ( 32%) 0.32 ( 80%) 3.72 ( 34%)
640k ( 0%)
TOTAL : 10.54 0.40 10.96
224M
For N == 1800 and -O1 it is already
tree canonical iv : 30.43 ( 28%) 0.05 ( 14%) 30.50 ( 28%)
116M ( 15%)
complete unrolling : 63.96 ( 59%) 0.06 ( 17%) 64.04 ( 59%)
175M ( 22%)
tree iv optimization : 5.75 ( 5%) 0.00 ( 0%) 5.77 ( 5%)
126M ( 16%)
integrated RA : 1.40 ( 1%) 0.12 ( 34%) 1.53 ( 1%)
1754k ( 0%)
TOTAL : 108.35 0.35 108.75
796M
For reference compile-time with N == 450 is 2.5s with
tree canonical iv : 0.18 ( 7%) 0.00 ( 0%) 0.19 ( 7%)
6904k ( 10%)
complete unrolling : 0.34 ( 14%) 0.00 ( 0%) 0.34 ( 13%)
8412k ( 13%)