[Bug target/100811] New: Consider not omitting frame pointers by default on targets with many registers

grasland at lal dot in2p3.fr via Gcc-bugs Fri, 28 May 2021 04:34:45 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100811


            Bug ID: 100811
           Summary: Consider not omitting frame pointers by default on
                    targets with many registers
           Product: gcc
           Version: 10.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: grasland at lal dot in2p3.fr
  Target Milestone: ---

Since at least GCC 4 (Bugzilla's duplicate search points me to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=13822), GCC has been omitting
frame pointers by defaults when optimizations are enabled, unless the extra
-fno-omit-frame-pointer flag is specified.

As far as I know, the rationale for doing this was that :

- On architectures with very few general purpose registers like 32-bit x86,
strictly following frame pointer retention discipline has a prohibitive
performance cost.
- Debuggers do not need frame pointers to do their job, as they can leverage
DWARF or PDB debug information instead.

While these arguments are valid, I would like to make the case that frame
pointers may be worth keeping by default on hardware architectures where this
is not too expensive (like x86_64), for the purpose of making software
performance analysis easier.

Unlike debuggers, sampling profilers like perf cannot afford the luxury of
walking the process stack using DWARF any time a sample is taken, as that would
take too much time and bias the measured performance profile. Instead, when
using DWARF for stack unwinding purposes, they have to take stack samples and
post-process them after the fact. Furthermore, since copying the full program
stack on every sample would generate an unbearable volume of data, they usually
can only afford to copy the top of the stack (upper 64KB at maximum for perf),
which will lead to corrupted stack traces when application stacks get deep or
there are lots of / large stack-allocated objects.

For all these reasons, DWARF-based stack unwinding is a somewhat unreliable
technique in profiling, where it's really hard to get >90% of your profile's
stack traces to be correctly reconstructed all the way to _start or _clone. The
remaining misreconstructed stack traces will translate into profile bias
(underestimated "children" overhead measurements), and thus performance
analysis mistakes.

To make matters worse, DWARF-based unwinding is relatively complex, and not
every useful runtime performance analysis tool supports it. For example,
BPF-based tracing tools, which are nowadays becoming popular due to their
highly appealing ability to instrument every kernel or user function on the
fly, do not currently support DWARF-based stack unwinding, most likely because
feeding the DWARF debug info into the kernel-based BPF program would either be
prohibitively expensive, a security hole, or a source of recursive tracing
incidents (tracing tool generates syscalls of the kind that it is tracing,
creating an infinite loop).

Therefore, I think -fno-omit-frame-pointer should be the default on
architectures where the price to pay is not too high (like x86_64), which
should ensure that modern performance analysis tooling works on all popular
Linux distributions without rebuilding the entire world. In this scheme,
-fomit-frame-pointer would remain as a default option for targets where it is
really needed (like legacy 32-bit x86), and as a specialist option for those
cases where the extra ~1% of performance is really truly needed and worth its
cost.

What do you think?

[Bug target/100811] New: Consider not omitting frame pointers by default on targets with many registers

Reply via email to