[Bug middle-end/102253] New: scalability issues with large loop depth

rguenth at gcc dot gnu.org via Gcc-bugs Thu, 09 Sep 2021 02:46:04 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253


            Bug ID: 102253
           Summary: scalability issues with large loop depth
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

When investigating an improvement to LIMs fill_always_executed_in I created
the following testcase which creates a loop nest of depth N with conditionally
executed subloops.

extern void foobar (int);
template <int a>
struct bar
{
  static void baz(int b, int)
  {
    if (b & (1 << (a % 32)))
      for (int i = 0; i < 1024; ++i)
        bar<a-1>::baz (b, i);
  }
};
template <>
struct bar<0>
{
  static void baz (int, int i) { foobar (i); }
};
void __attribute__((flatten)) foo(int b)
{
#ifndef N
#define N 10
#endif
  bar<N>::baz (b, 0);
}

For N == 900 (the maximum unless you also specify -ftemplate-depth) and -O1 we
see

 tree canonical iv                  :   1.42 ( 13%)   0.00 (  0%)   1.42 ( 13%)
   28M ( 13%)
 complete unrolling                 :   2.80 ( 27%)   0.00 (  0%)   2.81 ( 26%)
   42M ( 19%)
 integrated RA                      :   3.41 ( 32%)   0.32 ( 80%)   3.72 ( 34%)
  640k (  0%)
 TOTAL                              :  10.54          0.40         10.96       
  224M

For N == 1800 and -O1 it is already

 tree canonical iv                  :  30.43 ( 28%)   0.05 ( 14%)  30.50 ( 28%)
  116M ( 15%)
 complete unrolling                 :  63.96 ( 59%)   0.06 ( 17%)  64.04 ( 59%)
  175M ( 22%)
 tree iv optimization               :   5.75 (  5%)   0.00 (  0%)   5.77 (  5%)
  126M ( 16%)
 integrated RA                      :   1.40 (  1%)   0.12 ( 34%)   1.53 (  1%)
 1754k (  0%)
 TOTAL                              : 108.35          0.35        108.75       
  796M

For reference compile-time with N == 450 is 2.5s with

 tree canonical iv                  :   0.18 (  7%)   0.00 (  0%)   0.19 (  7%)
 6904k ( 10%)
 complete unrolling                 :   0.34 ( 14%)   0.00 (  0%)   0.34 ( 13%)
 8412k ( 13%)

[Bug middle-end/102253] New: scalability issues with large loop depth

Reply via email to