The patches in this set are part of an effort to provide support for tracing tools beyond attaching programs to probes and events and working with the context data they provide. It is also aimed at avoiding adding new helpers for every piece of task information that tracers may want to include in trace data (as was discussed at the Linux Plumbers Conference BPF mini-conference track last year).
One of the main characteristics of tracers is that a variety of information can be collected at the time of a probe firing. When using BPF program to implement actions to be taken when a probe fires, The most natural source of a large part of this information (task information, probe data, tracer state) is the context that is associated with a BPF program. It is also possible to obtain (most of) this information by means of full access helper calls like probe_read() but that isn't something you want to make available to an unprivileged user. So we have two areas where BPF programs can be very useful and powerful: - BPF programs that are attached to a probe or event, operating on the context provided by the probe or event - BPF programs that implement actions to be taken within the context of a tracing tool The Linux kernel provides a wealth of probes and events to which we can attach BPF programs. These event sources do not have any knowledge of the tracing tool that might be using them. But being able to use them from any tracing tool is definitely preferable over implementing your own probes and events. We definitely also do not want to 'teach' all the existing probes and events about any possible BPF program type that would like to get called from those probes and events. So, to illustrate what we're trying to accomplish, consider a kprobe. We can attach a BPF program to it and it will be called with a 'struct pt_regs' context. From the side of our tracing tool, we also want information about the task that triggered the kprobe to fire (beyond what is currently available through helpers) and we want to be able to access that information from a BPF program that implements what should happen when the probe fires (e.g. recording the event and specific data that we are interested in). The 2nd patch in this set implements a very basic generic tracer program type BPF_PROG_TYPE_GTRACE) that provides the pt_regs data and select task data in its context. We cannot attach a program of this type to a kprobe because that probe supports BPF_PROG_TYPE_KPROBE instead. The 1st patch in this set implements a mechanism to solve this issue: it allows a tail-call from one program type to another if the callee type supports conversion of a caller context into a context for the callee. So, in the sample, the BPF_PROG_TYPE_GTRACE provides can_cast() and cast_context() functions that support converting a BPF_PROG_TYPE_KPROBE context into a BPF_PROG_TYPE_GTRACE context. The work flow a tracer can use is: 1. The tracer creates a program array map, and inserts one or more programs of type BPF_PROG_TYPE_GTRACE. These programs implement whatever actions are to be taken when a specific probe fires. This step must be done first so that the program array is initialized with the correct program type. This type needs to be known so that when the calling program is verified, compatibility checking can be performed. 2. The tracer loads a program of type BPF_PROG_TYPE_KPROBE and attaches it to the kprobe we're interested in. This program contains a tail-call to a BPF_PROG_TYPE_GTRACE program in the program array. 3. The kprobe fires and executes our program (of type BPF_PROG_TYPE_KPROBE). 3.1 The program performs whatever operations that we need to have done at the level of the probe firing. 3.2 The program performs a tail-call into a program from our program array. 3.2.1 The execution of the tail-call instruction causes a call to be made to a cast_context() function provided by BPF_PROG_TYPE_GTRACE. This function creates a context structure, and populates it with task information and copies in the pt_regs data from the context that was passed to the BPF_PROG_TYPE_KPROBE program. 3.2.2 The new context is assigned to R1 (replacing the original context), and execution is transferred to the called program. The implementation is done in such way that existing tail-calls will work without any change aside from the fact that the verifier is inserting an instruction right before the tail-call. That instruction simply loads the BPF program type into R4. This ensures that at the time of the tail-call, the program type of the calling program can be passed to the cast_context() function. Knowledge about the program type of an executing program is not available anywhere and we need to know what context we're trying to convert from. The function prototype for the (pseudo-)helper bpf_tail_call declares only 3 arguments so existing code is not affected by this internal use of R4. Obviously, if there is no conversion function or the conversion is not supported, the tail-call will fail because that situation is effectively the same as trying to call a program of an incompatible type. The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to support what tracers commonly need, and I am also looking at ways to further extend this model to allow more tracer-specific features as well without the need for adding a BPF program types for every tracer. Kris