https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102577
Bug ID: 102577 Summary: Function frame size in relation to variables in conditional subscopes Product: gcc Version: 11.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: krzyk240 at gmail dot com Target Milestone: --- Consider the following code: ``` int bar(unsigned short x, int* cache = nullptr) { if (!cache) { int cache[1 << (sizeof(x) * 8)] = {}; return bar(x, cache); } if (cache[x] != 0) { return cache[x]; } if (x == 0) { cache[x] = 1; } else { cache[x] = (bar(x - 1, cache) + bar(x / 2, cache)) % 1000000007; } return cache[x]; } int bar2(unsigned short x, int* cache = nullptr) { if (!cache) { return [&] { int cache[1 << (sizeof(x) * 8)] = {}; return bar2(x, cache); }(); } if (cache[x] != 0) { return cache[x]; } if (x == 0) { cache[x] = 1; } else { cache[x] = (bar2(x - 1, cache) + bar2(x / 2, cache)) % 1000000007; } return cache[x]; } #include <cstdio> #include <cstdlib> int main() { printf("%i\n", bar(65535)); system("grep Vm /proc/$PPID/status"); } ``` The only difference between functions bar() and bar2() the first if -- in bar2() its contents are wrapped inside immediately invoked lambda. What is important, on the first call to bar() or bar2(), a big local array will be created on stack, and then in each recursive invocation, it will be not created. So this big local array is created ONCE if we analyze the control flow. The depth of the recursion is x, and I assume sizeof(unsigned short) == 2. So the program should use little memory (at most, a couple of MB). Unfortunately this program consumes much more memory than expected, VmRSS is around 270 MB, and VmStk is around 16 GB. Invoking it compiled with O2 (g++ abc.cc -O2 -o abc), produces: ``` 964369483 VmPeak: 16787136 kB VmSize: 16787100 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 269272 kB VmRSS: 269272 kB VmData: 240 kB VmStk: 16781320 kB VmExe: 4 kB VmLib: 3188 kB VmPTE: 32892 kB VmSwap: 0 kB ``` If we change invocation `bar(65535)` to `bar2(65535)` it produces: ``` 964369483 VmPeak: 9148 kB VmSize: 9112 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 6092 kB VmRSS: 6092 kB VmData: 240 kB VmStk: 3332 kB VmExe: 4 kB VmLib: 3188 kB VmPTE: 56 kB VmSwap: 0 kB ``` Which are the expected numbers. It seems that GCC propagates allocation of stack memory for the local array to the function scope ignoring enclosing conditional scope. So that stack frame size of bar() is always greater than 65536 * 4b. Clang seems behave in the same way as GCC. I tried to find some information why it is done this way, but failed to find anything (I checked, cppreference, c++17 standard, stack overflow etc.). So here is my question: Is this behavior compliant with C++ standard, or is it a bug, or is it an optimization that rarely has an adverse impact? IMO, moving big stack memory allocations from conditional subscopes to function scope, especially if this function is recursive is not a good idea. This behavior was quite surprising for me when I discovered why similar program uses (wastes) so much memory.