On 6/21/21 10:01 AM, Richard Biener wrote:
On Mon, May 31, 2021 at 7:16 PM Jose E. Marchesi via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:

[Changes from V8:
- Rebased to today's master.
- Adapted to use the write-symbols new infrastructure recently
   applied upstream.
- Little change in libiberty to copy .BTF sections over when
   LTOing.]

Hi people!

Last year we submitted a first patch series introducing support for
the CTF debugging format in GCC [1].  We got a lot of feedback that
prompted us to change the approach used to generate the debug info,
and this patch series is the result of that.

This series also add support for the BTF debug format, which is needed
by the BPF backend (more on this below.)

This implementation works, but there are several points that need
discussion and agreement with the upstream community, as they impact
the way debugging options work.  We are also proposing a way to add
additional debugging formats in the future.  See below for more
details.

Finally, a patch makes the BPF GCC backend to use the DWARF debug
hooks in order to make -gbtf available to it.

[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2019-05/msg01297.html

About CTF
=========

CTF is a debugging format designed in order to express C types in a
very compact way.  The key is compactness and simplicity.  For more
information see:

- CTF specification
   http://www.esperi.org.uk/~oranix/ctf/ctf-spec.pdf

- Compact C-Type support in the GNU toolchain (talk + slides)
   https://linuxplumbersconf.org/event/4/contributions/396/

- On type de-duplication in CTF (talk + slides)
   https://linuxplumbersconf.org/event/7/contributions/725/

About BTF
=========

BTF is a debugging format, similar to CTF, that is used in the Linux
kernel as the debugging format for BPF programs.  From the kernel
documentation:

"BTF (BPF Type Format) is the metadata format which encodes the debug
  info related to BPF program/map. The name BTF was used initially to
  describe data types. The BTF was later extended to include function
  info for defined subroutines, and line info for source/line
  information."

Supporting BTF in GCC is important because compiled BPF programs
(which GCC supports as a target) require the type information in order
to be loaded and run in diverse kernel versions.  This mechanism is
known as CO-RE (compile-once, run-everywhere) and is described in the
"Update of the BPF support in the GNU Toolchain" talk mentioned below.

The BTF is documented in the Linux kernel documentation tree:
- linux/Documentation/bpf/btf.rst

CTF in the GNU Toolchain
========================

During the last year we have been working in adding support for CTF to
several components of the GNU toolchain:

- binutils support is already upstream.  It supports linking objects
   with CTF information with full type de-duplication.

- GDB support is to be sent upstream very shortly.  It makes the
   debugger capable to use the CTF information whenever available.
   This is useful in cases where DWARF has been stripped out but CTF is
   kept.

- GCC support is being discussed and submitted in this series.

Overview of the Implementation
==============================

   dwarf2out.c

     The enabled debug formats are hooked in dwarf2out_early_finish.

   dwarf2int.h

     Internal interface that exports a few functions and data types
     defined in dwarf2out.c.

   dwarf2ctf.c

     Code that tranform the internal GCC DWARF DIEs into CTF container
     structures.  This file uses the dwarf2int.h interface.

   ctfc.c
   ctfc.h

     These two files implement the "CTF container", which is shared
     among CTF and BTF, due to the many similarities between both
     formats.

   ctfout.c

     Code that emits assembler with the .ctf section data, from the CTF
     container.

   btfout.c

     Code that emits assembler with the .BTF section data, from the CTF
     container.

 From debug hooks to debug formats
=================================

Our first attempt in adding CTF to GCC used the obvious approach of
adding a new set of debug hooks as defined in gcc/debug.h.

During our first interaction with the upstream community we were told
to _not_ use debug hooks, because these are to be obsoleted at some
point.  We were suggested to instead hook our handlers (which
processed type TREE nodes producing CTF types from them) somewhere
else.  So we did.

However at the time we were also facing the need to support BTF, which
is another type-related debug format needed by the BPF GCC backend.
Hooking here and there doesn't sound like such a good idea when it
comes to support several debug formats.

Therefore we thought about how to make GCC support diverse debugging
formats in a better way.  This led to a proposal we tried to discuss
at the GNU Tools Track in LPC2020:

- Update of the BPF support in the GNU Toolchain
   https://linuxplumbersconf.org/event/7/contributions/724/

Basically, the current situation in terms of diversity of debugging
formats in GCC can be summarized in the following like:

      tree     --+                  +--> dwarf2out
      rtl      --+                  +--> dbxout
                 +--> debug_hooks --+--> vmsdbgout
      backends --+                  +--> xcoffout
      lto      --+                  +--> godump

i.e. each debug format materializes in a set of debug hooks, as in
gcc/debug.h.  The installed hooks are then invoked from many different
areas of the compiler including front-end, middle-end, back-end and
also lto.  Most of the hooks get TREE objects, from which they are
supposed to extract/infer whatever information they need to express.

This approach has several problems, some of which were raised by you
people when we initially submitted the CTF support:

- The handlers depend on the TREE nodes, so if new TREE nodes are
   added to cover new languages, or functionality in existing
   languages, all the debug hooks may need to be updated to reflect it.

- This also happens when the contents of existing TREE node types
   change or get expanded.

- The semantics encoded in TREE nodes usually are not in the best form
   to be used by debug formats.  This implies that the several sets of
   debug hooks need to do very similar transformations, which again
   will have to be adjusted/corrected if the TREE nodes change.

- And more...

In contrast, this is how LLVM supports several debug formats:

                                      +--> DWARF
      IR --> class DebugHandlerBase --+--> CodeView
                                      +--> BTF

i.e. LLVM gets debugging information as part of the IR, and then has
debug info backends in the form of instances of DebugHandlerBase,
which process that subset of the IR to produce whatever debug output.

To overcome the problems above, we thought about introducing a new set
of debug hooks, resulting in something like this:

                    +--> godump
                    +--> xcoffout
       debug_hooks -+--> vmsdbgout
                    +--> dbxout                        +--> DWARF
                    +--> dwarf2out --> n_debug_hooks --+--> BTF
                                         (walk)        +--> CTF
                                                       ... more ...

See how these "new debug hooks" are intended to be called by the DWARF
old debug hooks.  In this way:

- The internal DWARF representation becomes the canonical (and only)
   IR for debugging information in the compiler.  This is similar to
   what LLVM uses to implement support for DWARF, BTF and the Microsoft
   debug format.

- Debug formats (like CTF, BTF, stabs, etc) are implemented to provide
   a very simple API that traverses the DWARF DIE trees available in
   dwarf2out.

- The semantics expressed in the DWARF DIEs, which have been already
   extracted from the TREE nodes, are free of many internal details and
   more suitable to be easily translated into whatever abstractions the
   debug formats require.

To avoid misunderstandings, we got to refer to these "new debug hooks"
simply as "debug formats".

In this patch series we are using this later approach in order to
support both CTF and BTF, and we can say we are happy about using the
internal DWARF DIEs as a source instead of TREE nodes: it led to a
more natural implementation, much easier to understand.  This sort of
confirms in practice that the approach is sound.

I agree it looks like it worked out - now I suggested this approach originally
so let's try to get OK from somebody else also involved with debug.  Jason?

Looks good to me, too.

Jason

Reply via email to