[v2: https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675241.html
 Changs from v2:
 - Change BTF format to match what is currently in use by clang, pahole and
   the linux kernel.  The format in prior versions of this series was a new
   format meant to address issues with the existing one.  However, during
   discussion at LSFMM/BPF in March, it was decided that it is not desirable
   to change the BTF format at this time, and the issues are not problematic
   in practice for current use cases.  Therefore this version of the series
   reverts to the 'old' BTF format, where type_tag can only be represented
   on pointer types.  This 'old' format is described below.
 - Address review comments on v2, including new patch 6 with tests for some
   BPF-target specific interactions.  ]

This patch series adds support for the btf_decl_tag and btf_type_tag attributes
to GCC. This entails:

- Two new C-family attributes that allow to associate (to "tag") particular
  declarations and types with arbitrary strings. As explained below, this is
  intended to be used to, for example, characterize certain pointer types.  A
  single declaration or type may have multiple occurrences of these attributes.

- The conveyance of that information in the DWARF output in the form of a new
  DIE: DW_TAG_GNU_annotation, and a new attribute: DW_AT_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
  kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. These BTF
  kinds are already supported by LLVM and other tools in the BPF ecosystem.

Both of these attributes are already supported by clang, and are already being
used in various ways by BPF support inside the Linux kernel.

It is worth noting that while the Linux kernel and BPF/BTF is the motivating use
case of this feature, the format of the new DWARF extension is generic.  This
work could be easily adapted to provide a general way for program authors to
annotate types and declarations with arbitrary information for any
post-compilation analysis needs, not just the Linux kernel BPF verifier.  For
example, these annotations could be used to aid in ABI analysis.

Purpose
=======

1)  Addition of C-family language constructs (attributes) to specify free-text
    tags on certain language elements, such as struct fields.

    The purpose of these annotations is to provide additional information about
    types, variables, and function parameters of interest to the kernel. A
    driving use case is to tag pointer types within the Linux kernel and BPF
    programs with additional semantic information, such as '__user' or '__rcu'.

    For example, consider the Linux kernel function do_execve with the
    following declaration:

      static int do_execve(struct filename *filename,
         const char __user *const __user *__argv,
         const char __user *const __user *__envp);

    Here, __user could be defined with these annotations to record semantic
    information about the pointer parameters (e.g., they are user-provided) in
    DWARF and BTF information. Other kernel facilities such as the BPF verifier
    can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

    The main motivation for emitting the tags in DWARF is that the Linux kernel
    generates its BTF information via pahole, using DWARF as a source:

        +--------+  BTF                  BTF   +----------+
        | pahole |-------> vmlinux.btf ------->| verifier |
        +--------+                             +----------+
            ^                                        ^
            |                                        |
      DWARF |                                    BTF |
            |                                        |
         vmlinux                              +-------------+
         module1.ko                           | BPF program |
         module2.ko                           +-------------+
           ...

    This is because:

    a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

    b)  GCC can generate BTF for whatever target with -gbtf, but there is no
        support for linking/deduplicating BTF in the linker.

    c)  pahole injects additional BTF information based on specific knowledge
        of kernel objects which is not available to the compiler.

    In the scenario above, the verifier needs access to the pointer tags of
    both the kernel types/declarations (conveyed in the DWARF and translated
    to BTF by pahole) and those of the BPF program (available directly in BTF).

    Another motivation for having the tag information in DWARF, unrelated to
    BPF and BTF, is that the drgn project (another DWARF consumer) also wants
    to benefit from these tags in order to differentiate between different
    kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

    This is easy: the main purpose of having this info in BTF is for the
    compiled BPF programs. The kernel verifier can then access the tags
    of pointers used by the BPF programs.

For more information about these tags and the motivation behind them, please
refer to the following Linux kernel discussions: [1], [2], [3].

DWARF Representation
====================

Compared to prior iterations of this work, this patch series introduces a new
DWARF representation meant to address issues in the previously proposed format.
The format is detailed below.

Note that the obvious solution of introducing a new DIE to be chained in type
chains similar to type modifiers like const and volatile is not feasible
because it would break DWARF readers.

New DWARF extension: DW_TAG_GNU_annotation.  These DIEs encode the annotation
information.  They exist near the top level of the DIE tree as children of the
compilation unit DIE.  The user-supplied annotations ("tags") are encoded via
DW_AT_name and DW_AT_const_value.  DW_AT_name holds the name of the attribute
which is the source of the annotation (currently only "btf_type_tag" or
"btf_decl_tag").  DW_AT_const_value holds the arbitrary user string from the
attribute argument.

  DW_TAG_GNU_annotation
    DW_AT_name: "btf_decl_tag" or "btf_type_tag"
    DW_AT_const_value: <arbitrary user-provided string from attribute arg>
    DW_AT_GNU_annotation: see below.

New DWARF extension: DW_AT_GNU_annotation.  If present, the
DW_AT_GNU_annotation attribute is a reference to a DW_TAG_GNU_annotation DIE
holding annotations for the object.

If a single declaration or type at the language level has multiple occurrences
of btf_decl_tag or btf_type_tag attribute, then the DW_TAG_GNU_annotation DIE
referenced by that object will itself have DW_AT_GNU_annotation referring to
another annotation DIE.  In this way the annotation DIEs are chained together.

Multiple distinct declarations or types may refer via DW_AT_GNU_annotation to
the same DW_TAG_GNU_annotation DIE, if they share the same tags.

For more information on this format, please refer to recent talks at GNU Tools
Cauldron [4] and Linux Plumbers Conference [5]. Older iterations of this work
and related discussions may be found in [6,7,8].

BTF Representation
==================

In BTF, BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG records convey the annotations.
Both records hold the annotation value in their name field.

BTF_KIND_DECL_TAG records refer to the annotated object by BTF ID.  Each
DECL_TAG record is followed by an additional 32-bit 'component_idx', which
indicates to which component of an object the tag applies.  If the annotated
object is a struct, union, or function, then 'component_idx' is the 0-based
index of the member or function parameter to which the tag applies.   If the
annotated object is a variable, or if it is a function and the tag applies to
the function declaration itself (rather than a parameter), then 'component_idx'
is -1.

BTF_KIND_TYPE_TAG records form part of the type chain.  Currently the BTF
format can only represent type tags applied to pointer types; type tags applied
to any non-pointer type cannot be represented in BTF.  For type tags applied to
pointer types, the BTF_KIND_PTR refers to the TYPE_TAG by ID, and the TYPE_TAG
refers to the pointee type by ID.

Example: btf_decl_tag
=====================

Consider the following declarations:

  int  *x __attribute__((btf_decl_tag ("rw"), btf_decl_tag ("devicemem")));
  struct {
    int size;
    char *ptr __attribute__((btf_decl_tag("rw")));
  } y;

These declarations produce the following DWARF information:

 <1><1e>: Abbrev Number: 3 (DW_TAG_variable)
    <1f>   DW_AT_name        : x
    <24>   DW_AT_type        : <0x36>
    <28>   DW_TAG_GNU_annotation: <0x4a>
    ...
 <1><36>: Abbrev Number: 1 (DW_TAG_pointer_type)
    <37>   DW_AT_byte_size   : 8
    <37>   DW_AT_type        : <0x3b>
 <1><3b>: Abbrev Number: 4 (DW_TAG_base_type)
    <3e>   DW_AT_name        : int
    ...
 <1><42>: Abbrev Number: 5 (DW_TAG_GNU_annotation)
    <43>   DW_AT_name        : (indirect string, offset: 0): btf_decl_tag
    <47>   DW_AT_const_value : rw
 <1><4a>: Abbrev Number: 6 (DW_TAG_GNU_annotation)
    <4b>   DW_AT_name        : (indirect string, offset: 0): btf_decl_tag
    <4f>   DW_AT_const_value : (indirect string, offset: 0x1f): devicemem
    <53>   DW_AT_GNU_annotation: <0x42>
 <1><57>: Abbrev Number: 7 (DW_TAG_structure_type)
    ...
 <2><60>: Abbrev Number: 8 (DW_TAG_member)
    <61>   DW_AT_name        : (indirect string, offset: 0x1a): size
    <68>   DW_AT_type        : <0x3b>
    ...
 <2><6d>: Abbrev Number: 9 (DW_TAG_member)
    <6e>   DW_AT_name        : ptr
    <75>   DW_AT_type        : <0x7f>
    <7a>   DW_AT_GNU_annotation: <0x42>
    ...
 <2><7e>: Abbrev Number: 0
 <1><7f>: Abbrev Number: 1 (DW_TAG_pointer_type)
    <80>   DW_AT_byte_size   : 8
    <80>   DW_AT_type        : <0x84>
 <1><84>: Abbrev Number: 10 (DW_TAG_base_type)
    <85>   DW_AT_byte_size   : 1
    <86>   DW_AT_encoding    : 6        (signed char)
    <87>   DW_AT_name        : (indirect string, offset: 0x5e): char
 <1><8b>: Abbrev Number: 11 (DW_TAG_variable)
    <8c>   DW_AT_name        : y
    <91>   DW_AT_type        : <0x57>
    ...

The variable DIE for 'x' refers by DW_AT_GNU_annotation to the DIE holding the
annotation for the "devicemem" tag, which in turn refers to the DIE holding
the annotation for "rw".  The DW_TAG_member DIE for the member 'ptr' of the
struct refers to the annotation die for "rw" directly, which is thereby shared
between the two declarations.

And BTF information:

  [1] STRUCT '(anon)' size=16 vlen=2
      'size' type_id=2 bits_offset=0
      'ptr' type_id=3 bits_offset=64
  [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
  [3] PTR '(anon)' type_id=4
  [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
  [5] PTR '(anon)' type_id=2
  [6] DECL_TAG 'devicemem' type_id=10 component_idx=-1
  [7] DECL_TAG 'rw' type_id=10 component_idx=-1
  [8] DECL_TAG 'rw' type_id=1 component_idx=1
  [9] VAR 'y' type_id=1, linkage=global
  [10] VAR 'x' type_id=5, linkage=global

Note how the component_idx identifies to which member of the struct type the
decl tag is applied.


Example: btf_type_tag
=====================

Consider the following code snippet:

  int __attribute__((btf_type_tag("rcu"), btf_type_tag ("foo"))) x;

  void
  do_thing (struct S * __attribute__((btf_type_tag ("rcu"))) rcu_s,
            void * __attribute__((btf_type_tag("foo"))) ptr)
  { ... }

The relevant DWARF information produced is as follows:

 <1><2e>: Abbrev Number: 3 (DW_TAG_structure_type)
    <2f>   DW_AT_name        : S
    ...
 ...
 <1><4d>: Abbrev Number: 6 (DW_TAG_variable)
    <4e>   DW_AT_name        : x
    <53>   DW_AT_type        : <0x61>
    ...
 <1><61>: Abbrev Number: 7 (DW_TAG_base_type)
    <62>   DW_AT_byte_size   : 4
    <63>   DW_AT_encoding    : 5        (signed)
    <64>   DW_AT_name        : int
    <68>   DW_AT_GNU_annotation: <0x75>
 <1><6c>: Abbrev Number: 1 (DW_TAG_GNU_annotation)
    <6d>   DW_AT_name        : (indirect string, offset: 0x13): btf_type_tag
    <71>   DW_AT_const_value : rcu
 <1><75>: Abbrev Number: 8 (DW_TAG_GNU_annotation)
    <76>   DW_AT_name        : (indirect string, offset: 0x13): btf_type_tag
    <7a>   DW_AT_const_value : foo
    <7e>   DW_AT_GNU_annotation: <0x6c>
 <1><82>: Abbrev Number: 9 (DW_TAG_subprogram)
    <83>   DW_AT_name        : (indirect string, offset: 0x20): do_thing
    ...
 <2><a1>: Abbrev Number: 10 (DW_TAG_formal_parameter)
    <a2>   DW_AT_name        : (indirect string, offset: 0x5): rcu_s
    <a9>   DW_AT_type        : <0xc0>
    ...
 <2><b0>: Abbrev Number: 11 (DW_TAG_formal_parameter)
    <b1>   DW_AT_name        : ptr
    <b8>   DW_AT_type        : <0xca>
    ...
 <2><bf>: Abbrev Number: 0
 <1><c0>: Abbrev Number: 12 (DW_TAG_pointer_type)
    <c1>   DW_AT_byte_size   : 8
    <c2>   DW_AT_type        : <0x2e>
    <c6>   DW_AT_GNU_annotation: <0x6c>
 <1><ca>: Abbrev Number: 13 (DW_TAG_pointer_type)
    <cb>   DW_AT_byte_size   : 8
    <cc>   DW_AT_GNU_annotation: <0xd0>
 <1><d0>: Abbrev Number: 1 (DW_TAG_GNU_annotation)
    <d1>   DW_AT_name        : (indirect string, offset: 0x13): btf_type_tag
    <d5>   DW_AT_const_value : foo

Note how in this case, two annotation DIEs for "foo" are produced, because
it is used in two distinct sets of type tags which do not allow it to be
shared. The DIE for "rcu", however, is shared between uses.

And BTF information:

  [1] STRUCT 'S' size=8 vlen=1
      ...
  [2] INT 'long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED
  [3] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
  [4] FUNC_PROTO '(anon)' ret_type_id=0 vlen=2
      'rcu_s' type_id=5
      'ptr' type_id=6
  [5] PTR '(anon)' type_id=9
  [6] PTR '(anon)' type_id=10
  [7] VAR 'x' type_id=3, linkage=global
  [8] FUNC 'do_thing' type_id=4 linkage=global
  [9] TYPE_TAG 'rcu' type_id=1
  [10] TYPE_TAG 'foo' type_id=0

Note how the TYPE_TAG are injected into the type chain between the PTR
record and the pointee type, e.g.

  param 'rcu_s' -> PTR -> TYPE_TAG 'rcu' -> STRUCT 'S'

Note also that the type tags which apply to the integer type of variable 'x'
are not represented, since BTF currently has no way to represent type tags
on non-pointer types.

References
==========

[1] https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
[2] https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
[3] https://lore.kernel.org/bpf/20211112012604.1504583-1-...@fb.com/
[4] 
https://gcc.gnu.org/wiki/cauldron2024#cauldron2024talks.what_is_new_in_the_bpf_support_in_the_gnu_toolchain
[5] https://lpc.events/event/18/contributions/1924/
[6] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/592685.html
[7] https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596355.html
[8] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624156.html

David Faust (6):
  c-family: add btf_type_tag and btf_decl_tag attributes
  dwarf: create annotation DIEs for btf tags
  ctf: translate annotation DIEs to internal ctf
  btf: generate and output DECL_TAG and TYPE_TAG records
  doc: document btf_type_tag and btf_decl_tag attributes
  bpf: add tests for CO-RE and BTF tag interaction

 gcc/btfout.cc                                 | 171 +++++++++--
 gcc/c-family/c-attribs.cc                     |  25 +-
 gcc/ctfc.cc                                   |  80 +++++-
 gcc/ctfc.h                                    |  43 ++-
 gcc/doc/extend.texi                           |  79 +++++
 gcc/dwarf2ctf.cc                              | 135 ++++++++-
 gcc/dwarf2out.cc                              | 270 +++++++++++++++++-
 .../gcc.dg/debug/btf/btf-decl-tag-1.c         |  14 +
 .../gcc.dg/debug/btf/btf-decl-tag-2.c         |  22 ++
 .../gcc.dg/debug/btf/btf-decl-tag-3.c         |  22 ++
 .../gcc.dg/debug/btf/btf-decl-tag-4.c         |  34 +++
 .../gcc.dg/debug/btf/btf-type-tag-1.c         |  26 ++
 .../gcc.dg/debug/btf/btf-type-tag-2.c         |  13 +
 .../gcc.dg/debug/btf/btf-type-tag-3.c         |  28 ++
 .../gcc.dg/debug/btf/btf-type-tag-4.c         |  24 ++
 .../gcc.dg/debug/btf/btf-type-tag-c2x-1.c     |  22 ++
 .../gcc.dg/debug/ctf/ctf-decl-tag-1.c         |  31 ++
 .../gcc.dg/debug/ctf/ctf-type-tag-1.c         |  19 ++
 .../debug/dwarf2/dwarf-btf-decl-tag-1.c       |  11 +
 .../debug/dwarf2/dwarf-btf-decl-tag-2.c       |  25 ++
 .../debug/dwarf2/dwarf-btf-decl-tag-3.c       |  21 ++
 .../debug/dwarf2/dwarf-btf-type-tag-1.c       |  10 +
 .../debug/dwarf2/dwarf-btf-type-tag-2.c       |  31 ++
 .../debug/dwarf2/dwarf-btf-type-tag-3.c       |  15 +
 .../debug/dwarf2/dwarf-btf-type-tag-4.c       |  33 +++
 .../debug/dwarf2/dwarf-btf-type-tag-5.c       |  10 +
 gcc/testsuite/gcc.target/bpf/core-btf-tag-1.c |  23 ++
 gcc/testsuite/gcc.target/bpf/core-btf-tag-2.c |  23 ++
 include/btf.h                                 |  14 +
 include/ctf.h                                 |   4 +
 include/dwarf2.def                            |   4 +
 31 files changed, 1240 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-c2x-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-decl-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-type-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-5.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-btf-tag-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-btf-tag-2.c

-- 
2.47.2

Reply via email to