[Changes from V8:
- Rebased to today's master.
- Adapted to use the write-symbols new infrastructure recently
applied upstream.
- Little change in libiberty to copy .BTF sections over when
LTOing.]
Hi people!
Last year we submitted a first patch series introducing support for
the CTF debugging format in GCC [1]. We got a lot of feedback that
prompted us to change the approach used to generate the debug info,
and this patch series is the result of that.
This series also add support for the BTF debug format, which is needed
by the BPF backend (more on this below.)
This implementation works, but there are several points that need
discussion and agreement with the upstream community, as they impact
the way debugging options work. We are also proposing a way to add
additional debugging formats in the future. See below for more
details.
Finally, a patch makes the BPF GCC backend to use the DWARF debug
hooks in order to make -gbtf available to it.
[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2019-05/msg01297.html
About CTF
=========
CTF is a debugging format designed in order to express C types in a
very compact way. The key is compactness and simplicity. For more
information see:
- CTF specification
http://www.esperi.org.uk/~oranix/ctf/ctf-spec.pdf
- Compact C-Type support in the GNU toolchain (talk + slides)
https://linuxplumbersconf.org/event/4/contributions/396/
- On type de-duplication in CTF (talk + slides)
https://linuxplumbersconf.org/event/7/contributions/725/
About BTF
=========
BTF is a debugging format, similar to CTF, that is used in the Linux
kernel as the debugging format for BPF programs. From the kernel
documentation:
"BTF (BPF Type Format) is the metadata format which encodes the debug
info related to BPF program/map. The name BTF was used initially to
describe data types. The BTF was later extended to include function
info for defined subroutines, and line info for source/line
information."
Supporting BTF in GCC is important because compiled BPF programs
(which GCC supports as a target) require the type information in order
to be loaded and run in diverse kernel versions. This mechanism is
known as CO-RE (compile-once, run-everywhere) and is described in the
"Update of the BPF support in the GNU Toolchain" talk mentioned below.
The BTF is documented in the Linux kernel documentation tree:
- linux/Documentation/bpf/btf.rst
CTF in the GNU Toolchain
========================
During the last year we have been working in adding support for CTF to
several components of the GNU toolchain:
- binutils support is already upstream. It supports linking objects
with CTF information with full type de-duplication.
- GDB support is to be sent upstream very shortly. It makes the
debugger capable to use the CTF information whenever available.
This is useful in cases where DWARF has been stripped out but CTF is
kept.
- GCC support is being discussed and submitted in this series.
Overview of the Implementation
==============================
dwarf2out.c
The enabled debug formats are hooked in dwarf2out_early_finish.
dwarf2int.h
Internal interface that exports a few functions and data types
defined in dwarf2out.c.
dwarf2ctf.c
Code that tranform the internal GCC DWARF DIEs into CTF container
structures. This file uses the dwarf2int.h interface.
ctfc.c
ctfc.h
These two files implement the "CTF container", which is shared
among CTF and BTF, due to the many similarities between both
formats.
ctfout.c
Code that emits assembler with the .ctf section data, from the CTF
container.
btfout.c
Code that emits assembler with the .BTF section data, from the CTF
container.
From debug hooks to debug formats
=================================
Our first attempt in adding CTF to GCC used the obvious approach of
adding a new set of debug hooks as defined in gcc/debug.h.
During our first interaction with the upstream community we were told
to _not_ use debug hooks, because these are to be obsoleted at some
point. We were suggested to instead hook our handlers (which
processed type TREE nodes producing CTF types from them) somewhere
else. So we did.
However at the time we were also facing the need to support BTF, which
is another type-related debug format needed by the BPF GCC backend.
Hooking here and there doesn't sound like such a good idea when it
comes to support several debug formats.
Therefore we thought about how to make GCC support diverse debugging
formats in a better way. This led to a proposal we tried to discuss
at the GNU Tools Track in LPC2020:
- Update of the BPF support in the GNU Toolchain
https://linuxplumbersconf.org/event/7/contributions/724/
Basically, the current situation in terms of diversity of debugging
formats in GCC can be summarized in the following like:
tree --+ +--> dwarf2out
rtl --+ +--> dbxout
+--> debug_hooks --+--> vmsdbgout
backends --+ +--> xcoffout
lto --+ +--> godump
i.e. each debug format materializes in a set of debug hooks, as in
gcc/debug.h. The installed hooks are then invoked from many different
areas of the compiler including front-end, middle-end, back-end and
also lto. Most of the hooks get TREE objects, from which they are
supposed to extract/infer whatever information they need to express.
This approach has several problems, some of which were raised by you
people when we initially submitted the CTF support:
- The handlers depend on the TREE nodes, so if new TREE nodes are
added to cover new languages, or functionality in existing
languages, all the debug hooks may need to be updated to reflect it.
- This also happens when the contents of existing TREE node types
change or get expanded.
- The semantics encoded in TREE nodes usually are not in the best form
to be used by debug formats. This implies that the several sets of
debug hooks need to do very similar transformations, which again
will have to be adjusted/corrected if the TREE nodes change.
- And more...
In contrast, this is how LLVM supports several debug formats:
+--> DWARF
IR --> class DebugHandlerBase --+--> CodeView
+--> BTF
i.e. LLVM gets debugging information as part of the IR, and then has
debug info backends in the form of instances of DebugHandlerBase,
which process that subset of the IR to produce whatever debug output.
To overcome the problems above, we thought about introducing a new set
of debug hooks, resulting in something like this:
+--> godump
+--> xcoffout
debug_hooks -+--> vmsdbgout
+--> dbxout +--> DWARF
+--> dwarf2out --> n_debug_hooks --+--> BTF
(walk) +--> CTF
... more ...
See how these "new debug hooks" are intended to be called by the DWARF
old debug hooks. In this way:
- The internal DWARF representation becomes the canonical (and only)
IR for debugging information in the compiler. This is similar to
what LLVM uses to implement support for DWARF, BTF and the Microsoft
debug format.
- Debug formats (like CTF, BTF, stabs, etc) are implemented to provide
a very simple API that traverses the DWARF DIE trees available in
dwarf2out.
- The semantics expressed in the DWARF DIEs, which have been already
extracted from the TREE nodes, are free of many internal details and
more suitable to be easily translated into whatever abstractions the
debug formats require.
To avoid misunderstandings, we got to refer to these "new debug hooks"
simply as "debug formats".
In this patch series we are using this later approach in order to
support both CTF and BTF, and we can say we are happy about using the
internal DWARF DIEs as a source instead of TREE nodes: it led to a
more natural implementation, much easier to understand. This sort of
confirms in practice that the approach is sound.