https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100811
Bug ID: 100811 Summary: Consider not omitting frame pointers by default on targets with many registers Product: gcc Version: 10.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: grasland at lal dot in2p3.fr Target Milestone: --- Since at least GCC 4 (Bugzilla's duplicate search points me to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=13822), GCC has been omitting frame pointers by defaults when optimizations are enabled, unless the extra -fno-omit-frame-pointer flag is specified. As far as I know, the rationale for doing this was that : - On architectures with very few general purpose registers like 32-bit x86, strictly following frame pointer retention discipline has a prohibitive performance cost. - Debuggers do not need frame pointers to do their job, as they can leverage DWARF or PDB debug information instead. While these arguments are valid, I would like to make the case that frame pointers may be worth keeping by default on hardware architectures where this is not too expensive (like x86_64), for the purpose of making software performance analysis easier. Unlike debuggers, sampling profilers like perf cannot afford the luxury of walking the process stack using DWARF any time a sample is taken, as that would take too much time and bias the measured performance profile. Instead, when using DWARF for stack unwinding purposes, they have to take stack samples and post-process them after the fact. Furthermore, since copying the full program stack on every sample would generate an unbearable volume of data, they usually can only afford to copy the top of the stack (upper 64KB at maximum for perf), which will lead to corrupted stack traces when application stacks get deep or there are lots of / large stack-allocated objects. For all these reasons, DWARF-based stack unwinding is a somewhat unreliable technique in profiling, where it's really hard to get >90% of your profile's stack traces to be correctly reconstructed all the way to _start or _clone. The remaining misreconstructed stack traces will translate into profile bias (underestimated "children" overhead measurements), and thus performance analysis mistakes. To make matters worse, DWARF-based unwinding is relatively complex, and not every useful runtime performance analysis tool supports it. For example, BPF-based tracing tools, which are nowadays becoming popular due to their highly appealing ability to instrument every kernel or user function on the fly, do not currently support DWARF-based stack unwinding, most likely because feeding the DWARF debug info into the kernel-based BPF program would either be prohibitively expensive, a security hole, or a source of recursive tracing incidents (tracing tool generates syscalls of the kind that it is tracing, creating an infinite loop). Therefore, I think -fno-omit-frame-pointer should be the default on architectures where the price to pay is not too high (like x86_64), which should ensure that modern performance analysis tooling works on all popular Linux distributions without rebuilding the entire world. In this scheme, -fomit-frame-pointer would remain as a default option for targets where it is really needed (like legacy 32-bit x86), and as a specialist option for those cases where the extra ~1% of performance is really truly needed and worth its cost. What do you think?