On Thu, Nov 01, 2018 at 09:08:37PM +0000, Edward Cree wrote: > I've spent a bit more time thinking about / sleeping on this, and I > still think there's a major disagreement here. Basically it seems > like I'm saying "the design of BTF is wrong" and you're saying "but > it's the design" (with the possible implication — I'm not entirely > sure — of "but that's what DWARF does"). > So let's back away from the details about FUNC/PROTO, and talk in > more general terms about what a BTF record means. > There are two classes of things we might want to put in debug-info: > * There exists a type T > * I have an instance X (variable, subprogram, etc.) of type T > Both of these may need to reference other types, and have the same > space of possible things T could be, but there the similarity ends: > they are semantically different things. > Indeed, the only reason for any record of the first class is to > define types referenced by records of the second class. Some > concrete examples of records of the second class are: > 1) I have a map named "foo" with key-type T1 and value-type T2 > 2) I have a subprogram named "bar" with prototype T3 > 3) I am using stack slot fp-8 to store a value of type T4 > 4) I am casting ctx+8 to a pointer type T5 before dereferencing it > Currently we have (1) and this patch series adds (2), both done > through records that look like they are just defining a type (i.e. > the first class of record) but have 'magic' semantics (in the case > of (1), special names of the form ____btf_map_foo. How anyone > thought that was a clean and tasteful design is beyond me.) > What IMHO the design *should* be, is that we have a 'types' > subsection that *only* contains records of the first class, and > then other subsections to hold records of the second class that > reference records of the first class by ID.
Such pure type approach wouldn't be practical. BTF is not pure type information. BTF is everything that verifier needs to know to make safety decisions that bpf instruction set doesn't have. For example two anonymous structs: struct { int a; int b; } var1; struct { int c; int d; } var2; from C point of view have the same type, but from BTF point of view they are different. Names of the fields are essential part of the BTF because the purpose of BTF is to provide information about bpf objects for debugging and safety reasons. Similarly int (*) (void *src, void *dst, int len); and int (*) (void *dst, void *src, int len); are the same from C and compiler point of view, but they are different in BTF, because names carry information that needs to be preserved. Same goes for function declarations. The function name and argument names are part of the 'type description'. We shouldn't be using word 'type' in pure form otherwise it will cause confusion like this thread demonstrated. Beyond prog names expressed in BTF we're adding global variables support. They will be expressed in BTF as a new KIND. Think of all global variables in a single .c file as fields of anonymous structure. They have offsets from beginning of .bss, sizes, further references into btf_type_ids and most importantly names. Another thing we're working on is spin_lock support. There we also have to rely on BTF to make sure that the certain bytes of map's value or cgroup local storage that belong to spin_lock will be masked for lookup/update calls. typedef u32 bpf_spin_lock; will be recognized by the verifier by its name. May be we will introduce new KIND_ for spin_lock too and convert name into KIND in btf writer. That is TBD. The main point that names of types, variables, functions has to be expressed in BTF as one coherent graph of information. Splitting pure types into one section, variables into another, functions into yet another is not practical, since the same modifiers (like const or volatile) need to be applied to variables and functions. At the end all sections will have the same style of encoding, hence no need to duplicate the encoding three times and instead it's cleaner to encode all of them BTF-style via different KINDs. vmlinux's BTF will include all global variables and functions, so that tracing scripts can reference to particular variable or function argument by name making kernel debuging easier and less error prone. Consider that right now typical bcc's trace.py script looks like: trace.py -I linux/skbuff.h -I net/sock.h \ 'skb_recv_datagram(struct sock *sk, unsigned int flags, int noblock, int *err) \ "sk_recv_q:%llx next:%llx prev:%llx" \ &sk->sk_receive_queue,sk->sk_receive_queue.next,sk->sk_receive_queue.prev' where func declaration is copy pasted from the C source and there are no guarantees that argument names and their order match what vmlinux actually has. With fully named BTF of vmlinux the tracing scripts will become more robust. Note that BTF_KIND_FUNC is pretty much the same as BTF_KIND_STRUCT with an addition of return type. Kernel specific things like per-cpu attribute of the variable will be BTF KIND too. This is information that tooling and kernel needs and current BTF graph design is perfectly suited to carry such info.