https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102577

            Bug ID: 102577
           Summary: Function frame size in relation to variables in
                    conditional subscopes
           Product: gcc
           Version: 11.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: krzyk240 at gmail dot com
  Target Milestone: ---

Consider the following code:
```
int bar(unsigned short x, int* cache = nullptr) {
    if (!cache) {
        int cache[1 << (sizeof(x) * 8)] = {};
        return bar(x, cache);
    }
    if (cache[x] != 0) {
        return cache[x];
    }
    if (x == 0) {
        cache[x] = 1;
    } else {
        cache[x] = (bar(x - 1, cache) + bar(x / 2, cache)) % 1000000007;
    }
    return cache[x];
}

int bar2(unsigned short x, int* cache = nullptr) {
    if (!cache) {
        return [&] {
            int cache[1 << (sizeof(x) * 8)] = {};
            return bar2(x, cache);
        }();
    }
    if (cache[x] != 0) {
        return cache[x];
    }
    if (x == 0) {
        cache[x] = 1;
    } else {
        cache[x] = (bar2(x - 1, cache) + bar2(x / 2, cache)) % 1000000007;
    }
    return cache[x];
}

#include <cstdio>
#include <cstdlib>

int main() {
    printf("%i\n", bar(65535));
    system("grep Vm /proc/$PPID/status");
}
```

The only difference between functions bar() and bar2() the first if -- in
bar2() its contents are wrapped inside immediately invoked lambda.

What is important, on the first call to bar() or bar2(), a big local array will
be created on stack, and then in each recursive invocation, it will be not
created. So this big local array is created ONCE if we analyze the control
flow.

The depth of the recursion is x, and I assume sizeof(unsigned short) == 2. So
the program should use little memory (at most, a couple of MB).

Unfortunately this program consumes much more memory than expected, VmRSS is
around 270 MB, and VmStk is around 16 GB. Invoking it compiled with O2 (g++
abc.cc -O2 -o abc), produces:
```
964369483
VmPeak: 16787136 kB
VmSize: 16787100 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    269272 kB
VmRSS:    269272 kB
VmData:      240 kB
VmStk:  16781320 kB
VmExe:         4 kB
VmLib:      3188 kB
VmPTE:     32892 kB
VmSwap:        0 kB
```

If we change invocation `bar(65535)` to `bar2(65535)` it produces:
```
964369483
VmPeak:     9148 kB
VmSize:     9112 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      6092 kB
VmRSS:      6092 kB
VmData:      240 kB
VmStk:      3332 kB
VmExe:         4 kB
VmLib:      3188 kB
VmPTE:        56 kB
VmSwap:        0 kB
```
Which are the expected numbers.

It seems that GCC propagates allocation of stack memory for the local array to
the function scope ignoring enclosing conditional scope. So that stack frame
size of bar() is always greater than 65536 * 4b. Clang seems behave in the same
way as GCC.

I tried to find some information why it is done this way, but failed to find
anything (I checked, cppreference, c++17 standard, stack overflow etc.). So
here is my question:
Is this behavior compliant with C++ standard, or is it a bug, or is it an
optimization that rarely has an adverse impact?

IMO, moving big stack memory allocations from conditional subscopes to function
scope, especially if this function is recursive is not a good idea. This
behavior was quite surprising for me when I discovered why similar program uses
(wastes) so much memory.

Reply via email to