Hello, Nice read, see some comments below
On Fri, Oct 06, 2017 at 11:34:30AM -0400, Steven Rostedt wrote: > On Fri, 6 Oct 2017 13:49:59 +0900 > Masami Hiramatsu <mhira...@kernel.org> wrote: > > > Steve, could you write a documentation how to use ftrace callback? > > I think I should update the Documentation/kprobes.txt so that jprobe > > user can easily migrate on that. > > I decided to do this now. Here's a first draft. What do you think? > > -- Steve > > Using ftrace to hook to functions > ================================= > > Copyright 2017 VMware Inc. > Author: Steven Rostedt <srost...@goodmis.org> > License: The GNU Free Documentation License, Version 1.2 > (dual licensed under the GPL v2) > > Written for: 4.14 > > Introduction > ------------ > > The ftrace infrastructure was originially created to attach hooks to the > beginning of functions in order to record and trace the flow of the kernel. > But hooks to the start of a function can have other use cases. Either > for live kernel patching, or for security monitoring. This document describes > how to use ftrace to implement your own function hooks. > > > The ftrace context > ================== > > WARNING: The ability to add a callback to almost any function within the > kernel comes with risks. A callback can be called from any context > (normal, softirq, irq, and NMI). Callbacks can also be called just before > going to idle, during CPU bring up and takedown, or going to user space. > This requires extra care to what can be done inside a callback. A callback > can be called outside the protective scope of RCU. > > The ftrace infrastructure has some protections agains recursions and RCU > but one must still be very careful how they use the callbacks. > > > The ftrace_ops structure > ======================== > > To register a function callback, a ftrace_ops is required. This structure > is used to tell ftrace what function should be called as the callback > as well as what protections the callback will perform and not require > ftrace to handle. > > There are only two fields that are needed to be set when registering > an ftrace_ops with ftrace. The rest should be NULL. > > struct ftrace_ops ops = { > .func = my_callback_func, > .flags = MY_FTRACE_FLAGS > .private = any_private_data_structure, > }; > > Both .flags and .private are optional. Only .func is required. > > To enable tracing call: > > register_ftrace_function(&ops); Maybe it would help to have a small section on 'The register function' below to answer? Is it possible to make changes to the filter after calling register_ftrace_function()? Or do you need to call register_ftrace_function() again? > To disable tracing call: > > unregister_ftrace_function(@ops); > > > The callback function > ===================== > > The prototype of the callback function is as follows (as of v4.14): > > void callback_func(unsigned long ip, unsigned long parent_ip, > struct ftrace_ops *op, struct pt_regs *regs); > > @ip - This is the instruction pointer of the function that is being traced. > (where the fentry or mcount is within the function) > > @parent_ip - This is the instruction pointer of the function that called the > the function being traced (where the call of the function occurred). > > @op - This is a pointer to ftrace_ops that was used to register the callback. > This can be used to pass data to the callback via the private pointer. > > @regs - If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED > flags are set in the ftrace_ops structure, then this will be pointing > to the pt_regs structure like it would be if an breakpoint was placed > at the start of the function where ftrace was tracing. Otherwise it > either contains garbage, or NULL. > > > The ftrace FLAGS > ================ > > The ftrace_ops flags are all defined and documented in include/linux/ftrace.h. > Some of the flags are used for internal infrastructure of ftrace, but the > ones that users should be aware of are the following: > > (All of these are prefixed with FTRACE_OPS_FL_) > > PER_CPU - When set, the callback can be enabled or disabled per cpu with the > following functions: > > void ftrace_function_local_enable(struct ftrace_ops *ops); > void ftrace_function_local_disable(struct ftrace_ops *ops); > > These two functions must be called with preemption disabled. > > SAVE_REGS - If the callback requires reading or modifying the pt_regs > passed to the callback, then it must set this flag. Registering > a ftrace_ops with this flag set on an architecture that does not > support passing of pt_regs to the callback, will fail. > > SAVE_REGS_IF_SUPPORTED - Similar to SAVE_REGS but the registering of a > ftrace_ops on an architecture that does not support passing of regs > will not fail with this flag set. But the callback must check if > regs is NULL or not to determine if the architecture supports it. > > RECURSION_SAFE - By default, a wrapper is added around the callback to > make sure that recursion of the function does not occur. That is > if a function within the callback itself is also traced, ftrace > will prevent the callback from being called again. But this wrapper > adds some overhead, and if the callback is safe from recursion, > it can set this flag to disable the ftrace protection. > > IPMODIFY - Requires SAVE_REGS set. If the callback is to "hijack" the > traced function (have another function called instead of the traced > function), it requires setting this flag. This is what live kernel > patches uses. Without this flag the pt_regs->ip can not be modified. > Note, only one ftrace_ops with IPMODIFY set may be registered to > any given function at a time. > > RCU - If this is set, then the callback will only be called by functions > where RCU is "watching". This is required if the callback function > performs any rcu_read_lock() operation. > > > Filtering what functions to trace > ================================= > > If a callback is only to be called from specific functions, a filter must be > set up. The filters are added by name, or ip if it is known. > > int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, > int len, int reset); > > @ops - the ops to set the filter with > @buf - the string that holds the function filter text. > @len - the length of the string. > @reset - non zero to reset all filters before applying this filter. > > Filters denote which functions should be enabled when tracing is enabled. > If @buf is NULL and reset is set, all functions will be enabled for tracing. > > > The @buf can also be a glob expression to enable all functions that > match a specific pattern. > > To just trace the schedule function: > > ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0); > > To add more functions, call the ftrace_set_filter() more than once with the > @reset parameter set to zero. To remove the current filter and replace it > with new functions to trace, have @reset be non zero. > > Sometimes more than one function has the same name. To trace just a specific > function in this case, ftrace_set_filter_ip() can be used. > > ret = ftrace_set_filter_ip(&ops, ip, 0, 0); > > Although the ip must be the address where the call to fentry or mcount is > located in the function. > > If a glob is used to set the filter, to remove unwanted matches the > ftrace_set_notrace() can also be used. > > int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, > int len, int reset); > > This takes the same parameters as ftrace_set_filter() but will add the > functions it finds to not be traced. This doesn't remove them from the > filter itself, but keeps them from being traced. If @reset is set, > the filter is cleaded but the functions that match @buf will still not 'cleared'? > be traced (the callback will not be called on those functions). This is a bit confusing, I guess it means 'the existng filter is cleared and the filter *will match all* functions excluding those that match @buf'. -Stafford