On 10/30/24 11:31 AM, David Faust wrote:
This patch series adds support for the btf_decl_tag and btf_type_tag attributes to GCC. This entails: - Two new C-family attributes that allow to associate (to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. A single declaration or type may have multiple occurrences of these attributes. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation, and a new attribute: DW_AT_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. These BTF kinds are already supported by LLVM and other tools in the BPF ecosystem. Both of these attributes are already supported by clang, and beginning to be used in various ways by eBPF users and inside the Linux kernel. Purpose ======= 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the Linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the Linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilities such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: +--------+ BTF BTF +----------+ | pahole |-------> vmlinux.btf ------->| verifier | +--------+ +----------+ ^ ^ | | DWARF | BTF | | | vmlinux +-------------+ module1.ko | BPF program | module2.ko +-------------+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following Linux kernel discussions: [1], [2], [3]. DWARF Representation ==================== Compared to prior iterations of this work, this patch series introduces a new DWARF representation meant to address issues in the previous format. The format is detailed below. New DWARF extension: DW_TAG_GNU_annotation. These DIEs encode the annotation information. They exist near the top level of the DIE tree as children of the compilation unit DIE. The user-supplied annotations ("tags") are encoded via DW_AT_name and DW_AT_const_value. DW_AT_name holds the name of the attribute which is the source of the annotation (currently only "btf_type_tag" or "btf_decl_tag"). DW_AT_const_value holds the arbitrary user string from the attribute argument. DW_TAG_GNU_annotation DW_AT_name: "btf_decl_tag" or "btf_type_tag" DW_AT_const_value: <arbitrary user-provided string from attribute arg> DW_AT_GNU_annotation: see below. New DWARF extension: DW_AT_GNU_annotation. If present, the DW_AT_GNU_annotation attribute is a reference to a DW_TAG_GNU_annotation DIE holding annotations for the object. If a single declaration or type at the language level has multiple occurrences of btf_decl_tag or btf_type_tag attribute, then the DW_TAG_GNU_annotation DIE referenced by that object will itself have DW_AT_GNU_annotation referring to another annotation DIE. In this way the annotation DIEs are chained together. Multiple distinct declarations or types may refer via DW_AT_GNU_annotation to the same DW_TAG_GNU_annotation DIE, if they share the same tags. For more information on this format, please refer to recent talks at GNU Tools Cauldron [4] and Linux Plumbers Conference [5]. Older iterations of this work and related discussions may be found in [6,7,8]. BTF Representation ================== In BTF, BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG records convey the annotations. These records hold the annotation value in their name field, and refer to the annotated object by BTF ID. BTF_KIND_DECL_TAG records are followed by an additional 32-bit 'component_idx', which indicates to which component of an object the tag applies. This index is -1 if the tag applies to a variable or function declaration itself, otherwise it is a 0-based index indicating to which function argument or struct or union member the tag applies. Example: btf_decl_tag ===================== Consider the following declarations: int *x __attribute__((btf_decl_tag ("rw"), btf_decl_tag ("devicemem"))); struct { int size; char *ptr __attribute__((btf_decl_tag("rw"))); } y; These declarations produce the following DWARF information: <1><1e>: Abbrev Number: 3 (DW_TAG_variable) <1f> DW_AT_name : x <24> DW_AT_type : <0x36> <28> DW_TAG_GNU_annotation: <0x4a> ... <1><36>: Abbrev Number: 1 (DW_TAG_pointer_type) <37> DW_AT_byte_size : 8 <37> DW_AT_type : <0x3b> <1><3b>: Abbrev Number: 4 (DW_TAG_base_type) <3e> DW_AT_name : int ... <1><42>: Abbrev Number: 5 (DW_TAG_GNU_annotation) <43> DW_AT_name : (indirect string, offset: 0): btf_decl_tag <47> DW_AT_const_value : rw <1><4a>: Abbrev Number: 6 (DW_TAG_GNU_annotation) <4b> DW_AT_name : (indirect string, offset: 0): btf_decl_tag <4f> DW_AT_const_value : (indirect string, offset: 0x1f): devicemem <53> DW_AT_GNU_annotation: <0x42> <1><57>: Abbrev Number: 7 (DW_TAG_structure_type) ... <2><60>: Abbrev Number: 8 (DW_TAG_member) <61> DW_AT_name : (indirect string, offset: 0x1a): size <68> DW_AT_type : <0x3b> ... <2><6d>: Abbrev Number: 9 (DW_TAG_member) <6e> DW_AT_name : ptr <75> DW_AT_type : <0x7f> <7a> DW_AT_GNU_annotation: <0x42> ... <2><7e>: Abbrev Number: 0 <1><7f>: Abbrev Number: 1 (DW_TAG_pointer_type) <80> DW_AT_byte_size : 8 <80> DW_AT_type : <0x84> <1><84>: Abbrev Number: 10 (DW_TAG_base_type) <85> DW_AT_byte_size : 1 <86> DW_AT_encoding : 6 (signed char) <87> DW_AT_name : (indirect string, offset: 0x5e): char <1><8b>: Abbrev Number: 11 (DW_TAG_variable) <8c> DW_AT_name : y <91> DW_AT_type : <0x57> ... The variable DIE for 'x' refers by DW_AT_GNU_annotation to the DIE holding the annotation for the "devicemem" tag, which in turn refers to the DIE holding the annotation for "rw". The DW_TAG_member DIE for the member 'ptr' of the struct refers to the annotation die for "rw" directly, which is thereby shared between the two declarations. And BTF information: [1] STRUCT '(anon)' size=16 vlen=2 'size' type_id=2 bits_offset=0 'ptr' type_id=3 bits_offset=64 [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED [3] PTR '(anon)' type_id=4 [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED [5] PTR '(anon)' type_id=2 [6] DECL_TAG 'devicemem' type_id=10 component_idx=-1 [7] DECL_TAG 'rw' type_id=10 component_idx=-1 [8] DECL_TAG 'rw' type_id=1 component_idx=1 [9] VAR 'y' type_id=1, linkage=global [10] VAR 'x' type_id=5, linkage=global Note how the component_idx identifies to which member of the struct type the decl tag is applied. Example: btf_type_tag ===================== Consider the following code snippet: int __attribute__((btf_type_tag("rcu"), btf_type_tag ("foo"))) x; void do_thing (struct S * __attribute__((btf_type_tag ("rcu"))) rcu_s, void * __attribute__((btf_type_tag("foo"))) ptr) { ... } The relevant DWARF information produced is as follows: <1><2e>: Abbrev Number: 3 (DW_TAG_structure_type) <2f> DW_AT_name : S ... <1><46>: Abbrev Number: 5 (DW_TAG_base_type) <47> DW_AT_byte_size : 4 <48> DW_AT_encoding : 5 (signed) <49> DW_AT_name : int <1><4d>: Abbrev Number: 6 (DW_TAG_variable) <4e> DW_AT_name : x <53> DW_AT_type : <0x61> ... <1><61>: Abbrev Number: 7 (DW_TAG_base_type) <62> DW_AT_byte_size : 4 <63> DW_AT_encoding : 5 (signed) <64> DW_AT_name : int <68> DW_AT_GNU_annotation: <0x75> <1><6c>: Abbrev Number: 1 (DW_TAG_GNU_annotation) <6d> DW_AT_name : (indirect string, offset: 0x13): btf_type_tag <71> DW_AT_const_value : rcu <1><75>: Abbrev Number: 8 (DW_TAG_GNU_annotation) <76> DW_AT_name : (indirect string, offset: 0x13): btf_type_tag <7a> DW_AT_const_value : foo <7e> DW_AT_GNU_annotation: <0x6c> <1><82>: Abbrev Number: 9 (DW_TAG_subprogram) <83> DW_AT_name : (indirect string, offset: 0x20): do_thing ... <2><a1>: Abbrev Number: 10 (DW_TAG_formal_parameter) <a2> DW_AT_name : (indirect string, offset: 0x5): rcu_s <a9> DW_AT_type : <0xc0> ... <2><b0>: Abbrev Number: 11 (DW_TAG_formal_parameter) <b1> DW_AT_name : ptr <b8> DW_AT_type : <0xca> ... <2><bf>: Abbrev Number: 0 <1><c0>: Abbrev Number: 12 (DW_TAG_pointer_type) <c1> DW_AT_byte_size : 8 <c2> DW_AT_type : <0x2e> <c6> Unknown AT value: 6000: <0x6c> <1><ca>: Abbrev Number: 13 (DW_TAG_pointer_type) <cb> DW_AT_byte_size : 8 <cc> DW_AT_GNU_annotation: <0xd0> <1><d0>: Abbrev Number: 1 (DW_TAG_GNU_annotation) <d1> DW_AT_name : (indirect string, offset: 0x13): btf_type_tag <d5> DW_AT_const_value : foo Note how in this case, two annotation DIEs for "foo" are produced, because it is used in two distinct sets of type tags which do not allow it to be shared. The DIE for "rcu", however, is shared between uses. And BTF information: [1] FUNC_PROTO '(anon)' ret_type_id=0 vlen=2 'rcu_s' type_id=2 'ptr' type_id=6 [2] TYPE_TAG 'rcu' type_id=3 [3] PTR '(anon)' type_id=4 [4] STRUCT 'S' size=4 vlen=1 ... [6] TYPE_TAG 'foo' type_id=7 [7] PTR '(anon)' type_id=0 [8] TYPE_TAG 'foo' type_id=9 [9] TYPE_TAG 'rcu' type_id=10 [10] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED [11] VAR 'x' type_id=8, linkage=global [12] FUNC 'do_thing' type_id=1 linkage=global References ========== [1] https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ [2] https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ [3] https://lore.kernel.org/bpf/20211112012604.1504583-1-...@fb.com/ [4] https://gcc.gnu.org/wiki/cauldron2024#cauldron2024talks.what_is_new_in_the_bpf_support_in_the_gnu_toolchain [5] https://lpc.events/event/18/contributions/1924/ [6] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/592685.html [7] https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596355.html [8] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624156.html Bootstrapped and tested on x86_64-linux-gnu. No known regressions. Also testsed with bpf-unknown-none, and with the Linux kernel BPF selftests which make use of the two attributes. David Faust (5): c-family: add btf_type_tag and btf_decl_tag attributes dwarf: create annotation DIEs for btf tags ctf: translate annotation DIEs to internal ctf btf: generate and output DECL_TAG and TYPE_TAG records doc: document btf_type_tag and btf_decl_tag attributes gcc/btfout.cc | 176 ++++++++++-- gcc/c-family/c-attribs.cc | 25 +- gcc/ctfc.cc | 66 ++++- gcc/ctfc.h | 41 ++- gcc/doc/extend.texi | 68 +++++ gcc/dwarf2ctf.cc | 180 ++++++++++++- gcc/dwarf2out.cc | 253 +++++++++++++++++- .../gcc.dg/debug/btf/btf-decl-tag-1.c | 14 + .../gcc.dg/debug/btf/btf-decl-tag-2.c | 22 ++ .../gcc.dg/debug/btf/btf-decl-tag-3.c | 22 ++ .../gcc.dg/debug/btf/btf-decl-tag-4.c | 34 +++ .../gcc.dg/debug/btf/btf-type-tag-1.c | 27 ++ .../gcc.dg/debug/btf/btf-type-tag-2.c | 17 ++ .../gcc.dg/debug/btf/btf-type-tag-3.c | 21 ++ .../gcc.dg/debug/btf/btf-type-tag-4.c | 25 ++ .../gcc.dg/debug/btf/btf-type-tag-c2x-1.c | 23 ++ .../debug/dwarf2/dwarf-btf-decl-tag-1.c | 11 + .../debug/dwarf2/dwarf-btf-decl-tag-2.c | 25 ++ .../debug/dwarf2/dwarf-btf-decl-tag-3.c | 21 ++ .../debug/dwarf2/dwarf-btf-type-tag-1.c | 10 + .../debug/dwarf2/dwarf-btf-type-tag-2.c | 31 +++ .../debug/dwarf2/dwarf-btf-type-tag-3.c | 15 ++ include/btf.h | 14 + include/ctf.h | 4 + include/dwarf2.def | 4 + 25 files changed, 1106 insertions(+), 43 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-1.c create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-2.c create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-3.c create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-4.c create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-1.c create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-2.c create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-3.c create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-4.c create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-c2x-1.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-1.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-2.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-3.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-1.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-2.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-3.c
Any update for this patch set? BPF-GCC CI is up now. See link: https://lore.kernel.org/bpf/mMhcrHuvf5fyjPwMa19kug9DHQH9yYcCJXKfaFMXhfQlKIuColex7zg7G6qpPqlfF74-IqzkhpZSlzsgvgikc-u6oQp27dNzFQAAatRaEuU=@pm.me/ Compilation of bpf selftests works now. Next step will be fix tests. btf_{type,decl}_tag's are important to bpf prog and btf_decl_tag's are heavily used for testing. It would be great if this patch set can be fixed and merged soon.