Gentle ping for this series. https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675241.html
On 2/6/25 11:54, David Faust wrote: > [v1: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666911.html > Changes from v1: > - Fix a bug in v1 related to generating DWARF for type tags applied to > struct or union types, especially if the type had multiple type tags > or was also part of a typedef. > - Simplified the dwarf2ctf translation of types having both cv-qualifiers > and type tags applied to them. > - Add a few new tests. > - Address review comments from v1. ] > > This patch series adds support for the btf_decl_tag and btf_type_tag > attributes > to GCC. This entails: > > - Two new C-family attributes that allow to associate (to "tag") particular > declarations and types with arbitrary strings. As explained below, this is > intended to be used to, for example, characterize certain pointer types. A > single declaration or type may have multiple occurrences of these > attributes. > > - The conveyance of that information in the DWARF output in the form of a new > DIE: DW_TAG_GNU_annotation, and a new attribute: DW_AT_GNU_annotation. > > - The conveyance of that information in the BTF output in the form of two new > kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. These BTF > kinds are already supported by LLVM and other tools in the BPF ecosystem. > > Both of these attributes are already supported by clang, and beginning to be > used in various ways by BPF users and inside the Linux kernel. > > It is worth noting that while the Linux kernel and BPF/BTF is the motivating > use > case of this feature, the format of the new DWARF extension is generic. This > work could be easily adapted to provide a general way for program authors to > annotate types and declarations with arbitrary information for any > post-compilation analysis needs, not just the Linux kernel BPF verifier. For > example, these annotations could be used to aid in ABI analysis. > > Purpose > ======= > > 1) Addition of C-family language constructs (attributes) to specify free-text > tags on certain language elements, such as struct fields. > > The purpose of these annotations is to provide additional information > about > types, variables, and function parameters of interest to the kernel. A > driving use case is to tag pointer types within the Linux kernel and BPF > programs with additional semantic information, such as '__user' or > '__rcu'. > > For example, consider the Linux kernel function do_execve with the > following declaration: > > static int do_execve(struct filename *filename, > const char __user *const __user *__argv, > const char __user *const __user *__envp); > > Here, __user could be defined with these annotations to record semantic > information about the pointer parameters (e.g., they are user-provided) in > DWARF and BTF information. Other kernel facilities such as the BPF > verifier > can read the tags and make use of the information. > > 2) Conveying the tags in the generated DWARF debug info. > > The main motivation for emitting the tags in DWARF is that the Linux > kernel > generates its BTF information via pahole, using DWARF as a source: > > +--------+ BTF BTF +----------+ > | pahole |-------> vmlinux.btf ------->| verifier | > +--------+ +----------+ > ^ ^ > | | > DWARF | BTF | > | | > vmlinux +-------------+ > module1.ko | BPF program | > module2.ko +-------------+ > ... > > This is because: > > a) Unlike GCC, LLVM will only generate BTF for BPF programs. > > b) GCC can generate BTF for whatever target with -gbtf, but there is no > support for linking/deduplicating BTF in the linker. > > c) pahole injects additional BTF information based on specific knowledge > of kernel objects which is not available to the compiler. > > In the scenario above, the verifier needs access to the pointer tags of > both the kernel types/declarations (conveyed in the DWARF and translated > to BTF by pahole) and those of the BPF program (available directly in > BTF). > > Another motivation for having the tag information in DWARF, unrelated to > BPF and BTF, is that the drgn project (another DWARF consumer) also wants > to benefit from these tags in order to differentiate between different > kinds of pointers in the kernel. > > 3) Conveying the tags in the generated BTF debug info. > > This is easy: the main purpose of having this info in BTF is for the > compiled BPF programs. The kernel verifier can then access the tags > of pointers used by the BPF programs. > > For more information about these tags and the motivation behind them, please > refer to the following Linux kernel discussions: [1], [2], [3]. > > DWARF Representation > ==================== > > Compared to prior iterations of this work, this patch series introduces a new > DWARF representation meant to address issues in the previously proposed > format. > The format is detailed below. > > Note that the obvious solution of introducing a new DIE to be chained in type > chains similar to type modifiers like const and volatile is not feasible > because it would break DWARF readers. > > New DWARF extension: DW_TAG_GNU_annotation. These DIEs encode the annotation > information. They exist near the top level of the DIE tree as children of the > compilation unit DIE. The user-supplied annotations ("tags") are encoded via > DW_AT_name and DW_AT_const_value. DW_AT_name holds the name of the attribute > which is the source of the annotation (currently only "btf_type_tag" or > "btf_decl_tag"). DW_AT_const_value holds the arbitrary user string from the > attribute argument. > > DW_TAG_GNU_annotation > DW_AT_name: "btf_decl_tag" or "btf_type_tag" > DW_AT_const_value: <arbitrary user-provided string from attribute arg> > DW_AT_GNU_annotation: see below. > > New DWARF extension: DW_AT_GNU_annotation. If present, the > DW_AT_GNU_annotation attribute is a reference to a DW_TAG_GNU_annotation DIE > holding annotations for the object. > > If a single declaration or type at the language level has multiple occurrences > of btf_decl_tag or btf_type_tag attribute, then the DW_TAG_GNU_annotation DIE > referenced by that object will itself have DW_AT_GNU_annotation referring to > another annotation DIE. In this way the annotation DIEs are chained together. > > Multiple distinct declarations or types may refer via DW_AT_GNU_annotation to > the same DW_TAG_GNU_annotation DIE, if they share the same tags. > > For more information on this format, please refer to recent talks at GNU Tools > Cauldron [4] and Linux Plumbers Conference [5]. Older iterations of this work > and related discussions may be found in [6,7,8]. > > BTF Representation > ================== > > In BTF, BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG records convey the > annotations. > These records hold the annotation value in their name field, and refer to the > annotated object by BTF ID. > > BTF_KIND_DECL_TAG records are followed by an additional 32-bit > 'component_idx', > which indicates to which component of an object the tag applies. This index > is -1 if the tag applies to a variable or function declaration itself, > otherwise it is a 0-based index indicating to which function argument or > struct > or union member the tag applies. > > Example: btf_decl_tag > ===================== > > Consider the following declarations: > > int *x __attribute__((btf_decl_tag ("rw"), btf_decl_tag ("devicemem"))); > struct { > int size; > char *ptr __attribute__((btf_decl_tag("rw"))); > } y; > > These declarations produce the following DWARF information: > > <1><1e>: Abbrev Number: 3 (DW_TAG_variable) > <1f> DW_AT_name : x > <24> DW_AT_type : <0x36> > <28> DW_TAG_GNU_annotation: <0x4a> > ... > <1><36>: Abbrev Number: 1 (DW_TAG_pointer_type) > <37> DW_AT_byte_size : 8 > <37> DW_AT_type : <0x3b> > <1><3b>: Abbrev Number: 4 (DW_TAG_base_type) > <3e> DW_AT_name : int > ... > <1><42>: Abbrev Number: 5 (DW_TAG_GNU_annotation) > <43> DW_AT_name : (indirect string, offset: 0): btf_decl_tag > <47> DW_AT_const_value : rw > <1><4a>: Abbrev Number: 6 (DW_TAG_GNU_annotation) > <4b> DW_AT_name : (indirect string, offset: 0): btf_decl_tag > <4f> DW_AT_const_value : (indirect string, offset: 0x1f): devicemem > <53> DW_AT_GNU_annotation: <0x42> > <1><57>: Abbrev Number: 7 (DW_TAG_structure_type) > ... > <2><60>: Abbrev Number: 8 (DW_TAG_member) > <61> DW_AT_name : (indirect string, offset: 0x1a): size > <68> DW_AT_type : <0x3b> > ... > <2><6d>: Abbrev Number: 9 (DW_TAG_member) > <6e> DW_AT_name : ptr > <75> DW_AT_type : <0x7f> > <7a> DW_AT_GNU_annotation: <0x42> > ... > <2><7e>: Abbrev Number: 0 > <1><7f>: Abbrev Number: 1 (DW_TAG_pointer_type) > <80> DW_AT_byte_size : 8 > <80> DW_AT_type : <0x84> > <1><84>: Abbrev Number: 10 (DW_TAG_base_type) > <85> DW_AT_byte_size : 1 > <86> DW_AT_encoding : 6 (signed char) > <87> DW_AT_name : (indirect string, offset: 0x5e): char > <1><8b>: Abbrev Number: 11 (DW_TAG_variable) > <8c> DW_AT_name : y > <91> DW_AT_type : <0x57> > ... > > The variable DIE for 'x' refers by DW_AT_GNU_annotation to the DIE holding the > annotation for the "devicemem" tag, which in turn refers to the DIE holding > the annotation for "rw". The DW_TAG_member DIE for the member 'ptr' of the > struct refers to the annotation die for "rw" directly, which is thereby shared > between the two declarations. > > And BTF information: > > [1] STRUCT '(anon)' size=16 vlen=2 > 'size' type_id=2 bits_offset=0 > 'ptr' type_id=3 bits_offset=64 > [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED > [3] PTR '(anon)' type_id=4 > [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED > [5] PTR '(anon)' type_id=2 > [6] DECL_TAG 'devicemem' type_id=10 component_idx=-1 > [7] DECL_TAG 'rw' type_id=10 component_idx=-1 > [8] DECL_TAG 'rw' type_id=1 component_idx=1 > [9] VAR 'y' type_id=1, linkage=global > [10] VAR 'x' type_id=5, linkage=global > > Note how the component_idx identifies to which member of the struct type the > decl tag is applied. > > > Example: btf_type_tag > ===================== > > Consider the following code snippet: > > int __attribute__((btf_type_tag("rcu"), btf_type_tag ("foo"))) x; > > void > do_thing (struct S * __attribute__((btf_type_tag ("rcu"))) rcu_s, > void * __attribute__((btf_type_tag("foo"))) ptr) > { ... } > > The relevant DWARF information produced is as follows: > > <1><2e>: Abbrev Number: 3 (DW_TAG_structure_type) > <2f> DW_AT_name : S > ... > <1><46>: Abbrev Number: 5 (DW_TAG_base_type) > <47> DW_AT_byte_size : 4 > <48> DW_AT_encoding : 5 (signed) > <49> DW_AT_name : int > <1><4d>: Abbrev Number: 6 (DW_TAG_variable) > <4e> DW_AT_name : x > <53> DW_AT_type : <0x61> > ... > <1><61>: Abbrev Number: 7 (DW_TAG_base_type) > <62> DW_AT_byte_size : 4 > <63> DW_AT_encoding : 5 (signed) > <64> DW_AT_name : int > <68> DW_AT_GNU_annotation: <0x75> > <1><6c>: Abbrev Number: 1 (DW_TAG_GNU_annotation) > <6d> DW_AT_name : (indirect string, offset: 0x13): btf_type_tag > <71> DW_AT_const_value : rcu > <1><75>: Abbrev Number: 8 (DW_TAG_GNU_annotation) > <76> DW_AT_name : (indirect string, offset: 0x13): btf_type_tag > <7a> DW_AT_const_value : foo > <7e> DW_AT_GNU_annotation: <0x6c> > <1><82>: Abbrev Number: 9 (DW_TAG_subprogram) > <83> DW_AT_name : (indirect string, offset: 0x20): do_thing > ... > <2><a1>: Abbrev Number: 10 (DW_TAG_formal_parameter) > <a2> DW_AT_name : (indirect string, offset: 0x5): rcu_s > <a9> DW_AT_type : <0xc0> > ... > <2><b0>: Abbrev Number: 11 (DW_TAG_formal_parameter) > <b1> DW_AT_name : ptr > <b8> DW_AT_type : <0xca> > ... > <2><bf>: Abbrev Number: 0 > <1><c0>: Abbrev Number: 12 (DW_TAG_pointer_type) > <c1> DW_AT_byte_size : 8 > <c2> DW_AT_type : <0x2e> > <c6> Unknown AT value: 6000: <0x6c> > <1><ca>: Abbrev Number: 13 (DW_TAG_pointer_type) > <cb> DW_AT_byte_size : 8 > <cc> DW_AT_GNU_annotation: <0xd0> > <1><d0>: Abbrev Number: 1 (DW_TAG_GNU_annotation) > <d1> DW_AT_name : (indirect string, offset: 0x13): btf_type_tag > <d5> DW_AT_const_value : foo > > Note how in this case, two annotation DIEs for "foo" are produced, because > it is used in two distinct sets of type tags which do not allow it to be > shared. The DIE for "rcu", however, is shared between uses. > > And BTF information: > > [1] FUNC_PROTO '(anon)' ret_type_id=0 vlen=2 > 'rcu_s' type_id=2 > 'ptr' type_id=6 > [2] TYPE_TAG 'rcu' type_id=3 > [3] PTR '(anon)' type_id=4 > [4] STRUCT 'S' size=4 vlen=1 > ... > [6] TYPE_TAG 'foo' type_id=7 > [7] PTR '(anon)' type_id=0 > [8] TYPE_TAG 'foo' type_id=9 > [9] TYPE_TAG 'rcu' type_id=10 > [10] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED > [11] VAR 'x' type_id=8, linkage=global > [12] FUNC 'do_thing' type_id=1 linkage=global > > References > ========== > > [1] https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ > [2] https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ > [3] https://lore.kernel.org/bpf/20211112012604.1504583-1-...@fb.com/ > [4] > https://gcc.gnu.org/wiki/cauldron2024#cauldron2024talks.what_is_new_in_the_bpf_support_in_the_gnu_toolchain > [5] https://lpc.events/event/18/contributions/1924/ > [6] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/592685.html > [7] https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596355.html > [8] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624156.html > > > David Faust (5): > c-family: add btf_type_tag and btf_decl_tag attributes > dwarf: create annotation DIEs for btf tags > ctf: translate annotation DIEs to internal ctf > btf: generate and output DECL_TAG and TYPE_TAG records > doc: document btf_type_tag and btf_decl_tag attributes > > gcc/btfout.cc | 171 +++++++++-- > gcc/c-family/c-attribs.cc | 25 +- > gcc/ctfc.cc | 70 ++++- > gcc/ctfc.h | 41 ++- > gcc/doc/extend.texi | 68 +++++ > gcc/dwarf2ctf.cc | 152 +++++++++- > gcc/dwarf2out.cc | 275 +++++++++++++++++- > .../gcc.dg/debug/btf/btf-decl-tag-1.c | 14 + > .../gcc.dg/debug/btf/btf-decl-tag-2.c | 22 ++ > .../gcc.dg/debug/btf/btf-decl-tag-3.c | 22 ++ > .../gcc.dg/debug/btf/btf-decl-tag-4.c | 34 +++ > .../gcc.dg/debug/btf/btf-type-tag-1.c | 27 ++ > .../gcc.dg/debug/btf/btf-type-tag-2.c | 15 + > .../gcc.dg/debug/btf/btf-type-tag-3.c | 21 ++ > .../gcc.dg/debug/btf/btf-type-tag-4.c | 25 ++ > .../gcc.dg/debug/btf/btf-type-tag-5.c | 35 +++ > .../gcc.dg/debug/btf/btf-type-tag-6.c | 15 + > .../gcc.dg/debug/btf/btf-type-tag-c2x-1.c | 23 ++ > .../gcc.dg/debug/ctf/ctf-decl-tag-1.c | 31 ++ > .../gcc.dg/debug/ctf/ctf-type-tag-1.c | 19 ++ > .../debug/dwarf2/dwarf-btf-decl-tag-1.c | 11 + > .../debug/dwarf2/dwarf-btf-decl-tag-2.c | 25 ++ > .../debug/dwarf2/dwarf-btf-decl-tag-3.c | 21 ++ > .../debug/dwarf2/dwarf-btf-type-tag-1.c | 10 + > .../debug/dwarf2/dwarf-btf-type-tag-2.c | 31 ++ > .../debug/dwarf2/dwarf-btf-type-tag-3.c | 15 + > include/btf.h | 14 + > include/ctf.h | 4 + > include/dwarf2.def | 4 + > 29 files changed, 1194 insertions(+), 46 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-1.c > create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-2.c > create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-3.c > create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-4.c > create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-1.c > create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-2.c > create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-3.c > create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-4.c > create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-5.c > create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-6.c > create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-c2x-1.c > create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-decl-tag-1.c > create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-type-tag-1.c > create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-1.c > create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-2.c > create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-3.c > create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-1.c > create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-2.c > create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-3.c >