On 6/14/22 22:53, Yonghong Song wrote:
> 
> 
> On 6/7/22 2:43 PM, David Faust wrote:
>> Hello,
>>
>> This patch series adds support for:
>>
>> - Two new C-language-level attributes that allow to associate (to "annotate" 
>> or
>>    to "tag") particular declarations and types with arbitrary strings. As
>>    explained below, this is intended to be used to, for example, characterize
>>    certain pointer types.
>>
>> - The conveyance of that information in the DWARF output in the form of a new
>>    DIE: DW_TAG_GNU_annotation.
>>
>> - The conveyance of that information in the BTF output in the form of two new
>>    kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>>
>> All of these facilities are being added to the eBPF ecosystem, and support 
>> for
>> them exists in some form in LLVM.
>>
>> Purpose
>> =======
>>
>> 1)  Addition of C-family language constructs (attributes) to specify 
>> free-text
>>      tags on certain language elements, such as struct fields.
>>
>>      The purpose of these annotations is to provide additional information 
>> about
>>      types, variables, and function parameters of interest to the kernel. A
>>      driving use case is to tag pointer types within the linux kernel and 
>> eBPF
>>      programs with additional semantic information, such as '__user' or 
>> '__rcu'.
>>
>>      For example, consider the linux kernel function do_execve with the
>>      following declaration:
>>
>>        static int do_execve(struct filename *filename,
>>           const char __user *const __user *__argv,
>>           const char __user *const __user *__envp);
>>
>>      Here, __user could be defined with these annotations to record semantic
>>      information about the pointer parameters (e.g., they are user-provided) 
>> in
>>      DWARF and BTF information. Other kernel facilites such as the eBPF 
>> verifier
>>      can read the tags and make use of the information.
>>
>> 2)  Conveying the tags in the generated DWARF debug info.
>>
>>      The main motivation for emitting the tags in DWARF is that the Linux 
>> kernel
>>      generates its BTF information via pahole, using DWARF as a source:
>>
>>          +--------+  BTF                  BTF   +----------+
>>          | pahole |-------> vmlinux.btf ------->| verifier |
>>          +--------+                             +----------+
>>              ^                                        ^
>>              |                                        |
>>        DWARF |                                    BTF |
>>              |                                        |
>>           vmlinux                              +-------------+
>>           module1.ko                           | BPF program |
>>           module2.ko                           +-------------+
>>             ...
>>
>>      This is because:
>>
>>      a)  Unlike GCC, LLVM will only generate BTF for BPF programs.
>>
>>      b)  GCC can generate BTF for whatever target with -gbtf, but there is no
>>          support for linking/deduplicating BTF in the linker.
>>
>>      In the scenario above, the verifier needs access to the pointer tags of
>>      both the kernel types/declarations (conveyed in the DWARF and translated
>>      to BTF by pahole) and those of the BPF program (available directly in 
>> BTF).
>>
>>      Another motivation for having the tag information in DWARF, unrelated to
>>      BPF and BTF, is that the drgn project (another DWARF consumer) also 
>> wants
>>      to benefit from these tags in order to differentiate between different
>>      kinds of pointers in the kernel.
>>
>> 3)  Conveying the tags in the generated BTF debug info.
>>
>>      This is easy: the main purpose of having this info in BTF is for the
>>      compiled eBPF programs. The kernel verifier can then access the tags
>>      of pointers used by the eBPF programs.
>>
>>
>> For more information about these tags and the motivation behind them, please
>> refer to the following linux kernel discussions:
>>
>>    https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
>>    https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
>>    https://lore.kernel.org/bpf/20211112012604.1504583-1-...@fb.com/
>>
>>
>> Implementation Overview
>> =======================
>>
>> To enable these annotations, two new C language attributes are added:
>> __attribute__((debug_annotate_decl("foo"))) and
>> __attribute__((debug_annotate_type("bar"))). Both attributes accept a single
>> arbitrary string constant argument, which will be recorded in the generated
>> DWARF and/or BTF debug information. They have no effect on code generation.
>>
>> Note that we are not using the same attribute names as LLVM (btf_decl_tag and
>> btf_type_tag, respectively). While these attributes are functionally very
>> similar, they have grown beyond purely BTF-specific uses, so inclusion of 
>> "btf"
>> in the attribute name seems misleading.
>>
>> DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating 
>> DWARF,
>> declarations and types will be checked for the corresponding attributes. If
>> present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE 
>> for
>> the annotated type or declaration, one for each tag. These DIEs link the
>> arbitrary tag value to the item they annotate.
>>
>> For example, the following variable declaration:
>>
>>    #define __typetag1 __attribute__((debug_annotate_type ("typetag1")))
>>
>>    #define __decltag1 __attribute__((debug_annotate_decl ("decltag1")))
>>    #define __decltag2 __attribute__((debug_annotate_decl ("decltag2")))
>>
>>    int * __typetag1 x __decltag1 __decltag2;
> 
> Based on the above example
>          static int do_execve(struct filename *filename,
>            const char __user *const __user *__argv,
>            const char __user *const __user *__envp);
> 
> Should the above example should be the below?
>      int __typetag1 * x __decltag1 __decltag2
> 

This example is not related to the one above. It is just meant to
show the behavior of both attributes. My apologies for not making
that clear.

>>
>> Produces the following DWARF information:
>>
>>   <1><1e>: Abbrev Number: 3 (DW_TAG_variable)
>>      <1f>   DW_AT_name        : x
>>      <21>   DW_AT_decl_file   : 1
>>      <22>   DW_AT_decl_line   : 7
>>      <23>   DW_AT_decl_column : 18
>>      <24>   DW_AT_type        : <0x49>
>>      <28>   DW_AT_external    : 1
>>      <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0      
>> (DW_OP_addr: 0)
>>      <32>   DW_AT_sibling     : <0x49>
>>   <2><36>: Abbrev Number: 1 (User TAG value: 0x6000)
>>      <37>   DW_AT_name        : (indirect string, offset: 0xd6): 
>> debug_annotate_decl
>>      <3b>   DW_AT_const_value : (indirect string, offset: 0xcd): decltag2
>>   <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000)
>>      <40>   DW_AT_name        : (indirect string, offset: 0xd6): 
>> debug_annotate_decl
>>      <44>   DW_AT_const_value : (indirect string, offset: 0x0): decltag1
>>   <2><48>: Abbrev Number: 0
>>   <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type)
>>      <4a>   DW_AT_byte_size   : 8
>>      <4b>   DW_AT_type        : <0x5d>
>>      <4f>   DW_AT_sibling     : <0x5d>
>>   <2><53>: Abbrev Number: 1 (User TAG value: 0x6000)
>>      <54>   DW_AT_name        : (indirect string, offset: 0x9): 
>> debug_annotate_type
>>      <58>   DW_AT_const_value : (indirect string, offset: 0x1d): typetag1
>>   <2><5c>: Abbrev Number: 0
>>   <1><5d>: Abbrev Number: 5 (DW_TAG_base_type)
>>      <5e>   DW_AT_byte_size   : 4
>>      <5f>   DW_AT_encoding    : 5    (signed)
>>      <60>   DW_AT_name        : int
>>   <1><64>: Abbrev Number: 0
> 
> Maybe you can also show what dwarf debug_info looks like
I am not sure what you mean. This is the .debug_info section as output 
by readelf -w. I did trim some information not relevant to the discussion
such as the DW_TAG_compile_unit DIE, for brevity.

> 
>>
>> In the case of BTF, the annotations are recorded in two type kinds recently
>> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>> The above example declaration prodcues the following BTF information:
>>
>> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>> [2] PTR '(anon)' type_id=3
>> [3] TYPE_TAG 'typetag1' type_id=1
>> [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1
>> [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1
>> [6] VAR 'x' type_id=2, linkage=global
>> [7] DATASEC '.bss' size=0 vlen=1
>>      type_id=6 offset=0 size=8 (VAR 'x')
>>
>>
> [...]

Reply via email to