Gentle ping for this series.
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675241.html

On 2/6/25 11:54, David Faust wrote:
> [v1: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666911.html
>  Changes from v1:
>  - Fix a bug in v1 related to generating DWARF for type tags applied to
>    struct or union types, especially if the type had multiple type tags
>    or was also part of a typedef.
>  - Simplified the dwarf2ctf translation of types having both cv-qualifiers
>    and type tags applied to them.
>  - Add a few new tests.
>  - Address review comments from v1.  ]
> 
> This patch series adds support for the btf_decl_tag and btf_type_tag 
> attributes
> to GCC. This entails:
> 
> - Two new C-family attributes that allow to associate (to "tag") particular
>   declarations and types with arbitrary strings. As explained below, this is
>   intended to be used to, for example, characterize certain pointer types.  A
>   single declaration or type may have multiple occurrences of these 
> attributes.
> 
> - The conveyance of that information in the DWARF output in the form of a new
>   DIE: DW_TAG_GNU_annotation, and a new attribute: DW_AT_GNU_annotation.
> 
> - The conveyance of that information in the BTF output in the form of two new
>   kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. These BTF
>   kinds are already supported by LLVM and other tools in the BPF ecosystem.
> 
> Both of these attributes are already supported by clang, and beginning to be
> used in various ways by BPF users and inside the Linux kernel.
> 
> It is worth noting that while the Linux kernel and BPF/BTF is the motivating 
> use
> case of this feature, the format of the new DWARF extension is generic.  This
> work could be easily adapted to provide a general way for program authors to
> annotate types and declarations with arbitrary information for any
> post-compilation analysis needs, not just the Linux kernel BPF verifier.  For
> example, these annotations could be used to aid in ABI analysis.
> 
> Purpose
> =======
> 
> 1)  Addition of C-family language constructs (attributes) to specify free-text
>     tags on certain language elements, such as struct fields.
> 
>     The purpose of these annotations is to provide additional information 
> about
>     types, variables, and function parameters of interest to the kernel. A
>     driving use case is to tag pointer types within the Linux kernel and BPF
>     programs with additional semantic information, such as '__user' or 
> '__rcu'.
> 
>     For example, consider the Linux kernel function do_execve with the
>     following declaration:
> 
>       static int do_execve(struct filename *filename,
>          const char __user *const __user *__argv,
>          const char __user *const __user *__envp);
> 
>     Here, __user could be defined with these annotations to record semantic
>     information about the pointer parameters (e.g., they are user-provided) in
>     DWARF and BTF information. Other kernel facilities such as the BPF 
> verifier
>     can read the tags and make use of the information.
> 
> 2)  Conveying the tags in the generated DWARF debug info.
> 
>     The main motivation for emitting the tags in DWARF is that the Linux 
> kernel
>     generates its BTF information via pahole, using DWARF as a source:
> 
>         +--------+  BTF                  BTF   +----------+
>         | pahole |-------> vmlinux.btf ------->| verifier |
>         +--------+                             +----------+
>             ^                                        ^
>             |                                        |
>       DWARF |                                    BTF |
>             |                                        |
>          vmlinux                              +-------------+
>          module1.ko                           | BPF program |
>          module2.ko                           +-------------+
>            ...
> 
>     This is because:
> 
>     a)  Unlike GCC, LLVM will only generate BTF for BPF programs.
> 
>     b)  GCC can generate BTF for whatever target with -gbtf, but there is no
>         support for linking/deduplicating BTF in the linker.
> 
>     c)  pahole injects additional BTF information based on specific knowledge
>         of kernel objects which is not available to the compiler.
> 
>     In the scenario above, the verifier needs access to the pointer tags of
>     both the kernel types/declarations (conveyed in the DWARF and translated
>     to BTF by pahole) and those of the BPF program (available directly in 
> BTF).
> 
>     Another motivation for having the tag information in DWARF, unrelated to
>     BPF and BTF, is that the drgn project (another DWARF consumer) also wants
>     to benefit from these tags in order to differentiate between different
>     kinds of pointers in the kernel.
> 
> 3)  Conveying the tags in the generated BTF debug info.
> 
>     This is easy: the main purpose of having this info in BTF is for the
>     compiled BPF programs. The kernel verifier can then access the tags
>     of pointers used by the BPF programs.
> 
> For more information about these tags and the motivation behind them, please
> refer to the following Linux kernel discussions: [1], [2], [3].
> 
> DWARF Representation
> ====================
> 
> Compared to prior iterations of this work, this patch series introduces a new
> DWARF representation meant to address issues in the previously proposed 
> format.
> The format is detailed below.
> 
> Note that the obvious solution of introducing a new DIE to be chained in type
> chains similar to type modifiers like const and volatile is not feasible
> because it would break DWARF readers.
> 
> New DWARF extension: DW_TAG_GNU_annotation.  These DIEs encode the annotation
> information.  They exist near the top level of the DIE tree as children of the
> compilation unit DIE.  The user-supplied annotations ("tags") are encoded via
> DW_AT_name and DW_AT_const_value.  DW_AT_name holds the name of the attribute
> which is the source of the annotation (currently only "btf_type_tag" or
> "btf_decl_tag").  DW_AT_const_value holds the arbitrary user string from the
> attribute argument.
> 
>   DW_TAG_GNU_annotation
>     DW_AT_name: "btf_decl_tag" or "btf_type_tag"
>     DW_AT_const_value: <arbitrary user-provided string from attribute arg>
>     DW_AT_GNU_annotation: see below.
> 
> New DWARF extension: DW_AT_GNU_annotation.  If present, the
> DW_AT_GNU_annotation attribute is a reference to a DW_TAG_GNU_annotation DIE
> holding annotations for the object.
> 
> If a single declaration or type at the language level has multiple occurrences
> of btf_decl_tag or btf_type_tag attribute, then the DW_TAG_GNU_annotation DIE
> referenced by that object will itself have DW_AT_GNU_annotation referring to
> another annotation DIE.  In this way the annotation DIEs are chained together.
> 
> Multiple distinct declarations or types may refer via DW_AT_GNU_annotation to
> the same DW_TAG_GNU_annotation DIE, if they share the same tags.
> 
> For more information on this format, please refer to recent talks at GNU Tools
> Cauldron [4] and Linux Plumbers Conference [5]. Older iterations of this work
> and related discussions may be found in [6,7,8].
> 
> BTF Representation
> ==================
> 
> In BTF, BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG records convey the 
> annotations.
> These records hold the annotation value in their name field, and refer to the
> annotated object by BTF ID.
> 
> BTF_KIND_DECL_TAG records are followed by an additional 32-bit 
> 'component_idx',
> which indicates to which component of an object the tag applies.  This index
> is -1 if the tag applies to a variable or function declaration itself,
> otherwise it is a 0-based index indicating to which function argument or 
> struct
> or union member the tag applies.
> 
> Example: btf_decl_tag
> =====================
> 
> Consider the following declarations:
> 
>   int  *x __attribute__((btf_decl_tag ("rw"), btf_decl_tag ("devicemem")));
>   struct {
>     int size;
>     char *ptr __attribute__((btf_decl_tag("rw")));
>   } y;
> 
> These declarations produce the following DWARF information:
> 
>  <1><1e>: Abbrev Number: 3 (DW_TAG_variable)
>     <1f>   DW_AT_name        : x
>     <24>   DW_AT_type        : <0x36>
>     <28>   DW_TAG_GNU_annotation: <0x4a>
>     ...
>  <1><36>: Abbrev Number: 1 (DW_TAG_pointer_type)
>     <37>   DW_AT_byte_size   : 8
>     <37>   DW_AT_type        : <0x3b>
>  <1><3b>: Abbrev Number: 4 (DW_TAG_base_type)
>     <3e>   DW_AT_name        : int
>     ...
>  <1><42>: Abbrev Number: 5 (DW_TAG_GNU_annotation)
>     <43>   DW_AT_name        : (indirect string, offset: 0): btf_decl_tag
>     <47>   DW_AT_const_value : rw
>  <1><4a>: Abbrev Number: 6 (DW_TAG_GNU_annotation)
>     <4b>   DW_AT_name        : (indirect string, offset: 0): btf_decl_tag
>     <4f>   DW_AT_const_value : (indirect string, offset: 0x1f): devicemem
>     <53>   DW_AT_GNU_annotation: <0x42>
>  <1><57>: Abbrev Number: 7 (DW_TAG_structure_type)
>     ...
>  <2><60>: Abbrev Number: 8 (DW_TAG_member)
>     <61>   DW_AT_name        : (indirect string, offset: 0x1a): size
>     <68>   DW_AT_type        : <0x3b>
>     ...
>  <2><6d>: Abbrev Number: 9 (DW_TAG_member)
>     <6e>   DW_AT_name        : ptr
>     <75>   DW_AT_type        : <0x7f>
>     <7a>   DW_AT_GNU_annotation: <0x42>
>     ...
>  <2><7e>: Abbrev Number: 0
>  <1><7f>: Abbrev Number: 1 (DW_TAG_pointer_type)
>     <80>   DW_AT_byte_size   : 8
>     <80>   DW_AT_type        : <0x84>
>  <1><84>: Abbrev Number: 10 (DW_TAG_base_type)
>     <85>   DW_AT_byte_size   : 1
>     <86>   DW_AT_encoding    : 6      (signed char)
>     <87>   DW_AT_name        : (indirect string, offset: 0x5e): char
>  <1><8b>: Abbrev Number: 11 (DW_TAG_variable)
>     <8c>   DW_AT_name        : y
>     <91>   DW_AT_type        : <0x57>
>     ...
> 
> The variable DIE for 'x' refers by DW_AT_GNU_annotation to the DIE holding the
> annotation for the "devicemem" tag, which in turn refers to the DIE holding
> the annotation for "rw".  The DW_TAG_member DIE for the member 'ptr' of the
> struct refers to the annotation die for "rw" directly, which is thereby shared
> between the two declarations.
> 
> And BTF information:
> 
>   [1] STRUCT '(anon)' size=16 vlen=2
>       'size' type_id=2 bits_offset=0
>       'ptr' type_id=3 bits_offset=64
>   [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>   [3] PTR '(anon)' type_id=4
>   [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
>   [5] PTR '(anon)' type_id=2
>   [6] DECL_TAG 'devicemem' type_id=10 component_idx=-1
>   [7] DECL_TAG 'rw' type_id=10 component_idx=-1
>   [8] DECL_TAG 'rw' type_id=1 component_idx=1
>   [9] VAR 'y' type_id=1, linkage=global
>   [10] VAR 'x' type_id=5, linkage=global
> 
> Note how the component_idx identifies to which member of the struct type the
> decl tag is applied.
> 
> 
> Example: btf_type_tag
> =====================
> 
> Consider the following code snippet:
> 
>   int __attribute__((btf_type_tag("rcu"), btf_type_tag ("foo"))) x;
> 
>   void
>   do_thing (struct S * __attribute__((btf_type_tag ("rcu"))) rcu_s,
>             void * __attribute__((btf_type_tag("foo"))) ptr)
>   { ... }
> 
> The relevant DWARF information produced is as follows:
> 
>  <1><2e>: Abbrev Number: 3 (DW_TAG_structure_type)
>     <2f>   DW_AT_name        : S
>     ...
>  <1><46>: Abbrev Number: 5 (DW_TAG_base_type)
>     <47>   DW_AT_byte_size   : 4
>     <48>   DW_AT_encoding    : 5      (signed)
>     <49>   DW_AT_name        : int
>  <1><4d>: Abbrev Number: 6 (DW_TAG_variable)
>     <4e>   DW_AT_name        : x
>     <53>   DW_AT_type        : <0x61>
>     ...
>  <1><61>: Abbrev Number: 7 (DW_TAG_base_type)
>     <62>   DW_AT_byte_size   : 4
>     <63>   DW_AT_encoding    : 5      (signed)
>     <64>   DW_AT_name        : int
>     <68>   DW_AT_GNU_annotation: <0x75>
>  <1><6c>: Abbrev Number: 1 (DW_TAG_GNU_annotation)
>     <6d>   DW_AT_name        : (indirect string, offset: 0x13): btf_type_tag
>     <71>   DW_AT_const_value : rcu
>  <1><75>: Abbrev Number: 8 (DW_TAG_GNU_annotation)
>     <76>   DW_AT_name        : (indirect string, offset: 0x13): btf_type_tag
>     <7a>   DW_AT_const_value : foo
>     <7e>   DW_AT_GNU_annotation: <0x6c>
>  <1><82>: Abbrev Number: 9 (DW_TAG_subprogram)
>     <83>   DW_AT_name        : (indirect string, offset: 0x20): do_thing
>     ...
>  <2><a1>: Abbrev Number: 10 (DW_TAG_formal_parameter)
>     <a2>   DW_AT_name        : (indirect string, offset: 0x5): rcu_s
>     <a9>   DW_AT_type        : <0xc0>
>     ...
>  <2><b0>: Abbrev Number: 11 (DW_TAG_formal_parameter)
>     <b1>   DW_AT_name        : ptr
>     <b8>   DW_AT_type        : <0xca>
>     ...
>  <2><bf>: Abbrev Number: 0
>  <1><c0>: Abbrev Number: 12 (DW_TAG_pointer_type)
>     <c1>   DW_AT_byte_size   : 8
>     <c2>   DW_AT_type        : <0x2e>
>     <c6>   Unknown AT value: 6000: <0x6c>
>  <1><ca>: Abbrev Number: 13 (DW_TAG_pointer_type)
>     <cb>   DW_AT_byte_size   : 8
>     <cc>   DW_AT_GNU_annotation: <0xd0>
>  <1><d0>: Abbrev Number: 1 (DW_TAG_GNU_annotation)
>     <d1>   DW_AT_name        : (indirect string, offset: 0x13): btf_type_tag
>     <d5>   DW_AT_const_value : foo
> 
> Note how in this case, two annotation DIEs for "foo" are produced, because
> it is used in two distinct sets of type tags which do not allow it to be
> shared. The DIE for "rcu", however, is shared between uses.
> 
> And BTF information:
> 
>   [1] FUNC_PROTO '(anon)' ret_type_id=0 vlen=2
>       'rcu_s' type_id=2
>       'ptr' type_id=6
>   [2] TYPE_TAG 'rcu' type_id=3
>   [3] PTR '(anon)' type_id=4
>   [4] STRUCT 'S' size=4 vlen=1
>       ...
>   [6] TYPE_TAG 'foo' type_id=7
>   [7] PTR '(anon)' type_id=0
>   [8] TYPE_TAG 'foo' type_id=9
>   [9] TYPE_TAG 'rcu' type_id=10
>   [10] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>   [11] VAR 'x' type_id=8, linkage=global
>   [12] FUNC 'do_thing' type_id=1 linkage=global
> 
> References
> ==========
> 
> [1] https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
> [2] https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
> [3] https://lore.kernel.org/bpf/20211112012604.1504583-1-...@fb.com/
> [4] 
> https://gcc.gnu.org/wiki/cauldron2024#cauldron2024talks.what_is_new_in_the_bpf_support_in_the_gnu_toolchain
> [5] https://lpc.events/event/18/contributions/1924/
> [6] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/592685.html
> [7] https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596355.html
> [8] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624156.html
> 
> 
> David Faust (5):
>   c-family: add btf_type_tag and btf_decl_tag attributes
>   dwarf: create annotation DIEs for btf tags
>   ctf: translate annotation DIEs to internal ctf
>   btf: generate and output DECL_TAG and TYPE_TAG records
>   doc: document btf_type_tag and btf_decl_tag attributes
> 
>  gcc/btfout.cc                                 | 171 +++++++++--
>  gcc/c-family/c-attribs.cc                     |  25 +-
>  gcc/ctfc.cc                                   |  70 ++++-
>  gcc/ctfc.h                                    |  41 ++-
>  gcc/doc/extend.texi                           |  68 +++++
>  gcc/dwarf2ctf.cc                              | 152 +++++++++-
>  gcc/dwarf2out.cc                              | 275 +++++++++++++++++-
>  .../gcc.dg/debug/btf/btf-decl-tag-1.c         |  14 +
>  .../gcc.dg/debug/btf/btf-decl-tag-2.c         |  22 ++
>  .../gcc.dg/debug/btf/btf-decl-tag-3.c         |  22 ++
>  .../gcc.dg/debug/btf/btf-decl-tag-4.c         |  34 +++
>  .../gcc.dg/debug/btf/btf-type-tag-1.c         |  27 ++
>  .../gcc.dg/debug/btf/btf-type-tag-2.c         |  15 +
>  .../gcc.dg/debug/btf/btf-type-tag-3.c         |  21 ++
>  .../gcc.dg/debug/btf/btf-type-tag-4.c         |  25 ++
>  .../gcc.dg/debug/btf/btf-type-tag-5.c         |  35 +++
>  .../gcc.dg/debug/btf/btf-type-tag-6.c         |  15 +
>  .../gcc.dg/debug/btf/btf-type-tag-c2x-1.c     |  23 ++
>  .../gcc.dg/debug/ctf/ctf-decl-tag-1.c         |  31 ++
>  .../gcc.dg/debug/ctf/ctf-type-tag-1.c         |  19 ++
>  .../debug/dwarf2/dwarf-btf-decl-tag-1.c       |  11 +
>  .../debug/dwarf2/dwarf-btf-decl-tag-2.c       |  25 ++
>  .../debug/dwarf2/dwarf-btf-decl-tag-3.c       |  21 ++
>  .../debug/dwarf2/dwarf-btf-type-tag-1.c       |  10 +
>  .../debug/dwarf2/dwarf-btf-type-tag-2.c       |  31 ++
>  .../debug/dwarf2/dwarf-btf-type-tag-3.c       |  15 +
>  include/btf.h                                 |  14 +
>  include/ctf.h                                 |   4 +
>  include/dwarf2.def                            |   4 +
>  29 files changed, 1194 insertions(+), 46 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-6.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-c2x-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-decl-tag-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-type-tag-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-3.c
> 

Reply via email to