[PATCH v3 0/5] tracing/probes: Support tracepoint events on modules
Hi, This is the 3rd version of the raw tracepoint events on modules. The previous version is here; https://lore.kernel.org/all/fbfec8d9-d0ed-4384-bbd2-dd5c1e568...@efficios.com/ This version supports tracepoint event on unloaded modules according to Mathies' suggestion ([2/5],[4/5] and part of [5/5]) . The concern about blocking module unload by instrumentation is TBD. Note, to support tracepoints in the unloaded modules, tracepoint event can not check the given tracepoint is really defined or not. So unless CONFIG_MODULES=n, it does not check the tracepoint existence. IOW, user can specify any tracepoint name for tracepoint events. It will be just ignored. You can download this series from; https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git topic/tprobe-on-module Thank you, --- Masami Hiramatsu (Google) (5): tracepoint: Support iterating over tracepoints on modules tracepoint: Support tterating tracepoints in a loading module tracing/fprobe: Support raw tracepoint events on modules tracing/fprobe: Support raw tracepoints on future loaded modules sefltests/tracing: Add a test for tracepoint events on modules include/linux/tracepoint.h | 20 ++ kernel/trace/trace_fprobe.c| 179 +++- kernel/tracepoint.c| 42 + tools/testing/selftests/ftrace/config |1 .../test.d/dynevent/add_remove_tprobe_module.tc| 61 +++ .../ftrace/test.d/dynevent/tprobe_syntax_errors.tc |1 6 files changed, 254 insertions(+), 50 deletions(-) create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_tprobe_module.tc -- Masami Hiramatsu (Google)
[PATCH v3 1/5] tracepoint: Support iterating over tracepoints on modules
From: Masami Hiramatsu (Google) Add for_each_module_tracepoint() for iterating over tracepoints on modules. This is similar to the for_each_kernel_tracepoint() but only for the tracepoints on modules (not including kernel built-in tracepoints). Signed-off-by: Masami Hiramatsu (Google) --- Changes in v3: - Add kerneldoc for for_each_module_tracepoint. --- include/linux/tracepoint.h |7 +++ kernel/tracepoint.c| 21 + 2 files changed, 28 insertions(+) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 6be396bb4297..837fcf8ec0d5 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -64,6 +64,8 @@ struct tp_module { bool trace_module_has_bad_taint(struct module *mod); extern int register_tracepoint_module_notifier(struct notifier_block *nb); extern int unregister_tracepoint_module_notifier(struct notifier_block *nb); +void for_each_module_tracepoint(void (*fct)(struct tracepoint *, void *), + void *priv); #else static inline bool trace_module_has_bad_taint(struct module *mod) { @@ -79,6 +81,11 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) { return 0; } +static inline +void for_each_module_tracepoint(void (*fct)(struct tracepoint *, void *), + void *priv) +{ +} #endif /* CONFIG_MODULES */ /* diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 8d1507dd0724..bed4aad36d92 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -735,6 +735,27 @@ static __init int init_tracepoints(void) return ret; } __initcall(init_tracepoints); + +/** + * for_each_module_tracepoint - iteration on all tracepoints in all modules + * @fct: callback + * @priv: private data + */ +void for_each_module_tracepoint(void (*fct)(struct tracepoint *tp, void *priv), + void *priv) +{ + struct tp_module *tp_mod; + struct module *mod; + + mutex_lock(&tracepoint_module_list_mutex); + list_for_each_entry(tp_mod, &tracepoint_module_list, list) { + mod = tp_mod->mod; + for_each_tracepoint_range(mod->tracepoints_ptrs, + mod->tracepoints_ptrs + mod->num_tracepoints, + fct, priv); + } + mutex_unlock(&tracepoint_module_list_mutex); +} #endif /* CONFIG_MODULES */ /**
[PATCH v3 2/5] tracepoint: Support tterating tracepoints in a loading module
From: Masami Hiramatsu (Google) Add for_each_tracepoint_in_module() function to iterate tracepoints in a module. This API is needed for handling tracepoints in a loading module from tracepoint_module_notifier callback function. This also update for_each_module_tracepoint() to pass the module to callback function so that it can find module easily. Signed-off-by: Masami Hiramatsu (Google) --- Changes in v3: - Newly added. --- include/linux/tracepoint.h | 17 +++-- kernel/tracepoint.c| 37 + 2 files changed, 44 insertions(+), 10 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 837fcf8ec0d5..93a9f3070b48 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -64,8 +64,13 @@ struct tp_module { bool trace_module_has_bad_taint(struct module *mod); extern int register_tracepoint_module_notifier(struct notifier_block *nb); extern int unregister_tracepoint_module_notifier(struct notifier_block *nb); -void for_each_module_tracepoint(void (*fct)(struct tracepoint *, void *), +void for_each_module_tracepoint(void (*fct)(struct tracepoint *, + struct module *, void *), void *priv); +void for_each_tracepoint_in_module(struct module *, + void (*fct)(struct tracepoint *, + struct module *, void *), + void *priv); #else static inline bool trace_module_has_bad_taint(struct module *mod) { @@ -82,10 +87,18 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) return 0; } static inline -void for_each_module_tracepoint(void (*fct)(struct tracepoint *, void *), +void for_each_module_tracepoint(void (*fct)(struct tracepoint *, + struct module *, void *), void *priv) { } +static inline +void for_each_tracepoint_in_module(struct module *mod, + void (*fct)(struct tracepoint *, + struct module *, void *), + void *priv) +{ +} #endif /* CONFIG_MODULES */ /* diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index bed4aad36d92..8879da16ef4d 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -736,24 +736,45 @@ static __init int init_tracepoints(void) } __initcall(init_tracepoints); +/** + * for_each_tracepoint_in_module - iteration on all tracepoints in a module + * @mod: module + * @fct: callback + * @priv: private data + */ +void for_each_tracepoint_in_module(struct module *mod, + void (*fct)(struct tracepoint *tp, + struct module *mod, void *priv), + void *priv) +{ + tracepoint_ptr_t *begin, *end, *iter; + + lockdep_assert_held(&tracepoint_module_list_mutex); + + if (!mod) + return; + + begin = mod->tracepoints_ptrs; + end = mod->tracepoints_ptrs + mod->num_tracepoints; + + for (iter = begin; iter < end; iter++) + fct(tracepoint_ptr_deref(iter), mod, priv); +} + /** * for_each_module_tracepoint - iteration on all tracepoints in all modules * @fct: callback * @priv: private data */ -void for_each_module_tracepoint(void (*fct)(struct tracepoint *tp, void *priv), +void for_each_module_tracepoint(void (*fct)(struct tracepoint *tp, +struct module *mod, void *priv), void *priv) { struct tp_module *tp_mod; - struct module *mod; mutex_lock(&tracepoint_module_list_mutex); - list_for_each_entry(tp_mod, &tracepoint_module_list, list) { - mod = tp_mod->mod; - for_each_tracepoint_range(mod->tracepoints_ptrs, - mod->tracepoints_ptrs + mod->num_tracepoints, - fct, priv); - } + list_for_each_entry(tp_mod, &tracepoint_module_list, list) + for_each_tracepoint_in_module(tp_mod->mod, fct, priv); mutex_unlock(&tracepoint_module_list_mutex); } #endif /* CONFIG_MODULES */
[PATCH v3 3/5] tracing/fprobe: Support raw tracepoint events on modules
From: Masami Hiramatsu (Google) Support raw tracepoint event on module by fprobe events. Since it only uses for_each_kernel_tracepoint() to find a tracepoint, the tracepoints on modules are not handled. Thus if user specified a tracepoint on a module, it shows an error. This adds new for_each_module_tracepoint() API to tracepoint subsystem, and uses it to find tracepoints on modules. Reported-by: don Closes: https://lore.kernel.org/all/20240530215718.aeec973a1d0bf058d39cb...@kernel.org/ Signed-off-by: Masami Hiramatsu (Google) --- Changes in v2: - Fix build errors with CONFIG_MODULES=y. --- kernel/trace/trace_fprobe.c | 46 --- 1 file changed, 38 insertions(+), 8 deletions(-) diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c index 62e6a8f4aae9..8b1127e37da5 100644 --- a/kernel/trace/trace_fprobe.c +++ b/kernel/trace/trace_fprobe.c @@ -385,6 +385,7 @@ static struct trace_fprobe *alloc_trace_fprobe(const char *group, const char *event, const char *symbol, struct tracepoint *tpoint, + struct module *mod, int maxactive, int nargs, bool is_return) { @@ -405,6 +406,7 @@ static struct trace_fprobe *alloc_trace_fprobe(const char *group, tf->fp.entry_handler = fentry_dispatcher; tf->tpoint = tpoint; + tf->mod = mod; tf->fp.nr_maxactive = maxactive; ret = trace_probe_init(&tf->tp, event, group, false, nargs); @@ -895,8 +897,23 @@ static struct notifier_block tracepoint_module_nb = { struct __find_tracepoint_cb_data { const char *tp_name; struct tracepoint *tpoint; + struct module *mod; }; +static void __find_tracepoint_module_cb(struct tracepoint *tp, struct module *mod, void *priv) +{ + struct __find_tracepoint_cb_data *data = priv; + + if (!data->tpoint && !strcmp(data->tp_name, tp->name)) { + data->tpoint = tp; + data->mod = mod; + if (!try_module_get(data->mod)) { + data->tpoint = NULL; + data->mod = NULL; + } + } +} + static void __find_tracepoint_cb(struct tracepoint *tp, void *priv) { struct __find_tracepoint_cb_data *data = priv; @@ -905,14 +922,28 @@ static void __find_tracepoint_cb(struct tracepoint *tp, void *priv) data->tpoint = tp; } -static struct tracepoint *find_tracepoint(const char *tp_name) +/* + * Find a tracepoint from kernel and module. If the tracepoint is in a module, + * this increments the module refcount to prevent unloading until the + * trace_fprobe is registered to the list. After registering the trace_fprobe + * on the trace_fprobe list, the module refcount is decremented because + * tracepoint_probe_module_cb will handle it. + */ +static struct tracepoint *find_tracepoint(const char *tp_name, + struct module **tp_mod) { struct __find_tracepoint_cb_data data = { .tp_name = tp_name, + .mod = NULL, }; for_each_kernel_tracepoint(__find_tracepoint_cb, &data); + if (!data.tpoint && IS_ENABLED(CONFIG_MODULES)) { + for_each_module_tracepoint(__find_tracepoint_module_cb, &data); + *tp_mod = data.mod; + } + return data.tpoint; } @@ -996,6 +1027,7 @@ static int __trace_fprobe_create(int argc, const char *argv[]) char abuf[MAX_BTF_ARGS_LEN]; char *dbuf = NULL; bool is_tracepoint = false; + struct module *tp_mod = NULL; struct tracepoint *tpoint = NULL; struct traceprobe_parse_context ctx = { .flags = TPARG_FL_KERNEL | TPARG_FL_FPROBE, @@ -1080,7 +1112,7 @@ static int __trace_fprobe_create(int argc, const char *argv[]) if (is_tracepoint) { ctx.flags |= TPARG_FL_TPOINT; - tpoint = find_tracepoint(symbol); + tpoint = find_tracepoint(symbol, &tp_mod); if (!tpoint) { trace_probe_log_set_index(1); trace_probe_log_err(0, NO_TRACEPOINT); @@ -1110,8 +1142,8 @@ static int __trace_fprobe_create(int argc, const char *argv[]) goto out; /* setup a probe */ - tf = alloc_trace_fprobe(group, event, symbol, tpoint, maxactive, - argc, is_return); + tf = alloc_trace_fprobe(group, event, symbol, tpoint, tp_mod, + maxactive, argc, is_return); if (IS_ERR(tf)) { ret = PTR_ERR(tf); /* This must return -ENOMEM, else there is a bug */ @@ -1119,10 +1151,6 @@ static int __trace_fprobe_create(
[PATCH v3 4/5] tracing/fprobe: Support raw tracepoints on future loaded modules
From: Masami Hiramatsu (Google) Support raw tracepoint events on future loaded (unloaded) modules. This allows user to create raw tracepoint events which can be used from module's __init functions. Note: since the kernel does not have any information about the tracepoints in the unloaded modules, fprobe events can not check whether the tracepoint exists nor extend the BTF based arguments. Suggested-by: Mathieu Desnoyers Signed-off-by: Masami Hiramatsu (Google) --- Changes in v3: - Newly added. --- kernel/trace/trace_fprobe.c| 151 +--- .../ftrace/test.d/dynevent/tprobe_syntax_errors.tc |1 2 files changed, 101 insertions(+), 51 deletions(-) diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c index 8b1127e37da5..a079abd8955b 100644 --- a/kernel/trace/trace_fprobe.c +++ b/kernel/trace/trace_fprobe.c @@ -21,6 +21,7 @@ #define FPROBE_EVENT_SYSTEM "fprobes" #define TRACEPOINT_EVENT_SYSTEM "tracepoints" #define RETHOOK_MAXACTIVE_MAX 4096 +#define TRACEPOINT_STUB ERR_PTR(-ENOENT) static int trace_fprobe_create(const char *raw_command); static int trace_fprobe_show(struct seq_file *m, struct dyn_event *ev); @@ -674,6 +675,24 @@ static int unregister_fprobe_event(struct trace_fprobe *tf) return trace_probe_unregister_event_call(&tf->tp); } +static int __regsiter_tracepoint_fprobe(struct trace_fprobe *tf) +{ + struct tracepoint *tpoint = tf->tpoint; + unsigned long ip = (unsigned long)tpoint->probestub; + int ret; + + /* +* Here, we do 2 steps to enable fprobe on a tracepoint. +* At first, put __probestub_##TP function on the tracepoint +* and put a fprobe on the stub function. +*/ + ret = tracepoint_probe_register_prio_may_exist(tpoint, + tpoint->probestub, NULL, 0); + if (ret < 0) + return ret; + return register_fprobe_ips(&tf->fp, &ip, 1); +} + /* Internal register function - just handle fprobe and flags */ static int __register_trace_fprobe(struct trace_fprobe *tf) { @@ -700,18 +719,12 @@ static int __register_trace_fprobe(struct trace_fprobe *tf) tf->fp.flags |= FPROBE_FL_DISABLED; if (trace_fprobe_is_tracepoint(tf)) { - struct tracepoint *tpoint = tf->tpoint; - unsigned long ip = (unsigned long)tpoint->probestub; - /* -* Here, we do 2 steps to enable fprobe on a tracepoint. -* At first, put __probestub_##TP function on the tracepoint -* and put a fprobe on the stub function. -*/ - ret = tracepoint_probe_register_prio_may_exist(tpoint, - tpoint->probestub, NULL, 0); - if (ret < 0) - return ret; - return register_fprobe_ips(&tf->fp, &ip, 1); + + /* This tracepoint is not loaded yet */ + if (tf->tpoint == TRACEPOINT_STUB) + return 0; + + return __regsiter_tracepoint_fprobe(tf); } /* TODO: handle filter, nofilter or symbol list */ @@ -864,36 +877,6 @@ static int register_trace_fprobe(struct trace_fprobe *tf) return ret; } -#ifdef CONFIG_MODULES -static int __tracepoint_probe_module_cb(struct notifier_block *self, - unsigned long val, void *data) -{ - struct tp_module *tp_mod = data; - struct trace_fprobe *tf; - struct dyn_event *pos; - - if (val != MODULE_STATE_GOING) - return NOTIFY_DONE; - - mutex_lock(&event_mutex); - for_each_trace_fprobe(tf, pos) { - if (tp_mod->mod == tf->mod) { - tracepoint_probe_unregister(tf->tpoint, - tf->tpoint->probestub, NULL); - tf->tpoint = NULL; - tf->mod = NULL; - } - } - mutex_unlock(&event_mutex); - - return NOTIFY_DONE; -} - -static struct notifier_block tracepoint_module_nb = { - .notifier_call = __tracepoint_probe_module_cb, -}; -#endif /* CONFIG_MODULES */ - struct __find_tracepoint_cb_data { const char *tp_name; struct tracepoint *tpoint; @@ -906,10 +889,12 @@ static void __find_tracepoint_module_cb(struct tracepoint *tp, struct module *mo if (!data->tpoint && !strcmp(data->tp_name, tp->name)) { data->tpoint = tp; - data->mod = mod; - if (!try_module_get(data->mod)) { - data->tpoint = NULL; - data->mod = NULL; + if (!data->mod) { + data->mod = mod; + if (!try_module_get(data->mod)) { + data->tpoint = NULL; + data->mod = NULL; + } }
[PATCH v3 5/5] sefltests/tracing: Add a test for tracepoint events on modules
From: Masami Hiramatsu (Google) Add a test case for tracepoint events on modules. This checks if it can add and remove the events correctly. Signed-off-by: Masami Hiramatsu (Google) --- Changes in v3: - Add not-loaded module test. --- tools/testing/selftests/ftrace/config |1 .../test.d/dynevent/add_remove_tprobe_module.tc| 61 2 files changed, 62 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_tprobe_module.tc diff --git a/tools/testing/selftests/ftrace/config b/tools/testing/selftests/ftrace/config index 048a312abf40..544de0db5f58 100644 --- a/tools/testing/selftests/ftrace/config +++ b/tools/testing/selftests/ftrace/config @@ -20,6 +20,7 @@ CONFIG_PREEMPT_TRACER=y CONFIG_PROBE_EVENTS_BTF_ARGS=y CONFIG_SAMPLES=y CONFIG_SAMPLE_FTRACE_DIRECT=m +CONFIG_SAMPLE_TRACE_EVENTS=m CONFIG_SAMPLE_TRACE_PRINTK=m CONFIG_SCHED_TRACER=y CONFIG_STACK_TRACER=y diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_tprobe_module.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_tprobe_module.tc new file mode 100644 index ..d319d5ed4226 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_tprobe_module.tc @@ -0,0 +1,61 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Generic dynamic event - add/remove tracepoint probe events on module +# requires: dynamic_events "t[:[/][]] []":README + +rmmod trace-events-sample ||: +if ! modprobe trace-events-sample ; then + echo "No trace-events sample module - please make CONFIG_SAMPLE_TRACE_EVENTS=m" + exit_unresolved; +fi +trap "rmmod trace-events-sample" EXIT + +echo 0 > events/enable +echo > dynamic_events + +TRACEPOINT1=foo_bar +TRACEPOINT2=foo_bar_with_cond + +echo "t:myevent1 $TRACEPOINT1" >> dynamic_events +echo "t:myevent2 $TRACEPOINT2" >> dynamic_events + +grep -q myevent1 dynamic_events +grep -q myevent2 dynamic_events +test -d events/tracepoints/myevent1 +test -d events/tracepoints/myevent2 + +echo "-:myevent2" >> dynamic_events + +grep -q myevent1 dynamic_events +! grep -q myevent2 dynamic_events + +echo > dynamic_events + +clear_trace + +:;: "Try to put a probe on a tracepoint in non-loaded module" ;: +rmmod trace-events-sample + +echo "t:myevent1 $TRACEPOINT1" >> dynamic_events +echo "t:myevent2 $TRACEPOINT2" >> dynamic_events + +grep -q myevent1 dynamic_events +grep -q myevent2 dynamic_events +test -d events/tracepoints/myevent1 +test -d events/tracepoints/myevent2 + +echo 1 > events/tracepoints/enable + +modprobe trace-events-sample + +sleep 2 + +grep -q "myevent1" trace +grep -q "myevent2" trace + +rmmod trace-events-sample +trap "" EXIT + +echo 0 > events/tracepoints/enable +echo > dynamic_events +clear_trace
[PATCH v13 00/20] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph
Hi, Here is the 13th version of the series to re-implement the fprobe on function-graph tracer. The previous version is; https://lore.kernel.org/all/172000134410.63468.1374887213469474.stgit@devnote2/ This version is based on v6.11-rc3. In this version, I added a bugfix as [1/20], which should go to urgent branch, and dropped the performance improvement patch which was introduced in v12 because I found that does not work with new kernel. Overview This series rewrites the fprobe on this function-graph. The purposes of this change are; 1) Remove dependency of the rethook from fprobe so that we can reduce the return hook code and shadow stack. 2) Make 'ftrace_regs' the common trace interface for the function boundary. 1) Currently we have 2(or 3) different function return hook codes, the function-graph tracer and rethook (and legacy kretprobe). But since this is redundant and needs double maintenance cost, I would like to unify those. From the user's viewpoint, function- graph tracer is very useful to grasp the execution path. For this purpose, it is hard to use the rethook in the function-graph tracer, but the opposite is possible. (Strictly speaking, kretprobe can not use it because it requires 'pt_regs' for historical reasons.) 2) Now the fprobe provides the 'pt_regs' for its handler, but that is wrong for the function entry and exit. Moreover, depending on the architecture, there is no way to accurately reproduce 'pt_regs' outside of interrupt or exception handlers. This means fprobe should not use 'pt_regs' because it does not use such exceptions. (Conversely, kprobe should use 'pt_regs' because it is an abstract interface of the software breakpoint exception.) This series changes fprobe to use function-graph tracer for tracing function entry and exit, instead of mixture of ftrace and rethook. Unlike the rethook which is a per-task list of system-wide allocated nodes, the function graph's ret_stack is a per-task shadow stack. Thus it does not need to set 'nr_maxactive' (which is the number of pre-allocated nodes). Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'. Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as their register interface, this changes it to convert 'ftrace_regs' to 'pt_regs'. Of course this conversion makes an incomplete 'pt_regs', so users must access only registers for function parameters or return value. Design -- Instead of using ftrace's function entry hook directly, the new fprobe is built on top of the function-graph's entry and return callbacks with 'ftrace_regs'. Since the fprobe requires access to 'ftrace_regs', the architecture must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS and CONFIG_HAVE_FTRACE_GRAPH_FUNC, which enables to call function-graph entry callback with 'ftrace_regs', and also CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to return_to_handler. All fprobes share a single function-graph ops (means shares a common ftrace filter) similar to the kprobe-on-ftrace. This needs another layer to find corresponding fprobe in the common function-graph callbacks, but has much better scalability, since the number of registered function-graph ops is limited. In the entry callback, the fprobe runs its entry_handler and saves the address of 'fprobe' on the function-graph's shadow stack as data. The return callback decodes the data to get the 'fprobe' address, and runs the exit_handler. The fprobe introduces two hash-tables, one is for entry callback which searches fprobes related to the given function address passed by entry callback. The other is for a return callback which checks if the given 'fprobe' data structure pointer is still valid. Note that it is possible to unregister fprobe before the return callback runs. Thus the address validation must be done before using it in the return callback. Download This series can be applied against the ftrace/for-next branch in linux-trace tree. This series can also be found below branch. https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-on-fgraph Thank you, --- Masami Hiramatsu (Google) (20): tracing: fgraph: Fix to add new fgraph_ops to array after ftrace_startup_subops() tracing: Add a comment about ftrace_regs definition tracing: Rename ftrace_regs_return_value to ftrace_regs_get_return_value function_graph: Pass ftrace_regs to entryfunc function_graph: Replace fgraph_ret_regs with ftrace_regs function_graph: Pass ftrace_regs to retfunc fprobe: Use ftrace_regs in fprobe entry handler fprobe: Use ftrace_regs in fprobe exit handler tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs tracing: Add ftrace_fill_perf_regs() for perf event tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled ftrace: Add C
[PATCH v13 01/20] tracing: fgraph: Fix to add new fgraph_ops to array after ftrace_startup_subops()
From: Masami Hiramatsu (Google) Since the register_ftrace_graph() assigns a new fgraph_ops to fgraph_array before registring it by ftrace_startup_subops(), the new fgraph_ops can be used in function_graph_enter(). In most cases, it is still OK because those fgraph_ops's hashtable is already initialized by ftrace_set_filter*() etc. But if a user registers a new fgraph_ops which does not initialize the hash list, ftrace_ops_test() in function_graph_enter() causes a NULL pointer dereference BUG because fgraph_ops->ops.func_hash is NULL. This can be reproduced by the below commands because function profiler's fgraph_ops does not initialize the hash list; # cd /sys/kernel/tracing # echo function_graph > current_tracer # echo 1 > function_profile_enabled To fix this problem, add a new fgraph_ops to fgraph_array after ftrace_startup_subops(). Thus, until the new fgraph_ops is initialized, we will see fgraph_stub on the corresponding fgraph_array entry. Fixes: c132be2c4fcc ("function_graph: Have the instances use their own ftrace_ops for filtering") Signed-off-by: Masami Hiramatsu (Google) --- kernel/trace/fgraph.c | 31 ++- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index d1d5ea2d0a1b..d7d4fb403f6f 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -1206,18 +1206,24 @@ static void init_task_vars(int idx) read_unlock(&tasklist_lock); } -static void ftrace_graph_enable_direct(bool enable_branch) +static void ftrace_graph_enable_direct(bool enable_branch, struct fgraph_ops *gops) { trace_func_graph_ent_t func = NULL; trace_func_graph_ret_t retfunc = NULL; int i; - for_each_set_bit(i, &fgraph_array_bitmask, -sizeof(fgraph_array_bitmask) * BITS_PER_BYTE) { - func = fgraph_array[i]->entryfunc; - retfunc = fgraph_array[i]->retfunc; - fgraph_direct_gops = fgraph_array[i]; -} + if (gops) { + func = gops->entryfunc; + retfunc = gops->retfunc; + fgraph_direct_gops = gops; + } else { + for_each_set_bit(i, &fgraph_array_bitmask, +sizeof(fgraph_array_bitmask) * BITS_PER_BYTE) { + func = fgraph_array[i]->entryfunc; + retfunc = fgraph_array[i]->retfunc; + fgraph_direct_gops = fgraph_array[i]; + } + } if (WARN_ON_ONCE(!func)) return; @@ -1256,8 +1262,6 @@ int register_ftrace_graph(struct fgraph_ops *gops) ret = -ENOSPC; goto out; } - - fgraph_array[i] = gops; gops->idx = i; ftrace_graph_active++; @@ -1266,7 +1270,7 @@ int register_ftrace_graph(struct fgraph_ops *gops) ftrace_graph_disable_direct(true); if (ftrace_graph_active == 1) { - ftrace_graph_enable_direct(false); + ftrace_graph_enable_direct(false, gops); register_pm_notifier(&ftrace_suspend_notifier); ret = start_graph_tracing(); if (ret) @@ -1281,14 +1285,15 @@ int register_ftrace_graph(struct fgraph_ops *gops) } else { init_task_vars(gops->idx); } - /* Always save the function, and reset at unregistering */ gops->saved_func = gops->entryfunc; ret = ftrace_startup_subops(&graph_ops, &gops->ops, command); + if (!ret) + fgraph_array[i] = gops; + error: if (ret) { - fgraph_array[i] = &fgraph_stub; ftrace_graph_active--; gops->saved_func = NULL; fgraph_lru_release_index(i); @@ -1324,7 +1329,7 @@ void unregister_ftrace_graph(struct fgraph_ops *gops) ftrace_shutdown_subops(&graph_ops, &gops->ops, command); if (ftrace_graph_active == 1) - ftrace_graph_enable_direct(true); + ftrace_graph_enable_direct(true, NULL); else if (!ftrace_graph_active) ftrace_graph_disable_direct(false);
[PATCH v13 02/20] tracing: Add a comment about ftrace_regs definition
From: Masami Hiramatsu (Google) To clarify what will be expected on ftrace_regs, add a comment to the architecture independent definition of the ftrace_regs. Signed-off-by: Masami Hiramatsu (Google) Acked-by: Mark Rutland --- Changes in v8: - Update that the saved registers depends on the context. Changes in v3: - Add instruction pointer Changes in v2: - newly added. --- include/linux/ftrace.h | 26 ++ 1 file changed, 26 insertions(+) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index fd5e84d0ec47..42106b3de396 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -117,6 +117,32 @@ extern int ftrace_enabled; #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS +/** + * ftrace_regs - ftrace partial/optimal register set + * + * ftrace_regs represents a group of registers which is used at the + * function entry and exit. There are three types of registers. + * + * - Registers for passing the parameters to callee, including the stack + * pointer. (e.g. rcx, rdx, rdi, rsi, r8, r9 and rsp on x86_64) + * - Registers for passing the return values to caller. + * (e.g. rax and rdx on x86_64) + * - Registers for hooking the function call and return including the + * frame pointer (the frame pointer is architecture/config dependent) + * (e.g. rip, rbp and rsp for x86_64) + * + * Also, architecture dependent fields can be used for internal process. + * (e.g. orig_ax on x86_64) + * + * On the function entry, those registers will be restored except for + * the stack pointer, so that user can change the function parameters + * and instruction pointer (e.g. live patching.) + * On the function exit, only registers which is used for return values + * are restored. + * + * NOTE: user *must not* access regs directly, only do it via APIs, because + * the member can be changed according to the architecture. + */ struct ftrace_regs { struct pt_regs regs; };
[PATCH v13 03/20] tracing: Rename ftrace_regs_return_value to ftrace_regs_get_return_value
From: Masami Hiramatsu (Google) Rename ftrace_regs_return_value to ftrace_regs_get_return_value as same as other ftrace_regs_get/set_* APIs. Signed-off-by: Masami Hiramatsu (Google) Acked-by: Mark Rutland --- Changes in v6: - Moved to top of the series. Changes in v3: - Newly added. --- arch/loongarch/include/asm/ftrace.h |2 +- arch/powerpc/include/asm/ftrace.h |2 +- arch/s390/include/asm/ftrace.h |2 +- arch/x86/include/asm/ftrace.h |2 +- include/linux/ftrace.h |2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/loongarch/include/asm/ftrace.h b/arch/loongarch/include/asm/ftrace.h index c0a682808e07..6f8517d59954 100644 --- a/arch/loongarch/include/asm/ftrace.h +++ b/arch/loongarch/include/asm/ftrace.h @@ -69,7 +69,7 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs, unsigned long ip) regs_get_kernel_argument(&(fregs)->regs, n) #define ftrace_regs_get_stack_pointer(fregs) \ kernel_stack_pointer(&(fregs)->regs) -#define ftrace_regs_return_value(fregs) \ +#define ftrace_regs_get_return_value(fregs) \ regs_return_value(&(fregs)->regs) #define ftrace_regs_set_return_value(fregs, ret) \ regs_set_return_value(&(fregs)->regs, ret) diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h index 559560286e6d..23d26f3afae4 100644 --- a/arch/powerpc/include/asm/ftrace.h +++ b/arch/powerpc/include/asm/ftrace.h @@ -59,7 +59,7 @@ ftrace_regs_get_instruction_pointer(struct ftrace_regs *fregs) regs_get_kernel_argument(&(fregs)->regs, n) #define ftrace_regs_get_stack_pointer(fregs) \ kernel_stack_pointer(&(fregs)->regs) -#define ftrace_regs_return_value(fregs) \ +#define ftrace_regs_get_return_value(fregs) \ regs_return_value(&(fregs)->regs) #define ftrace_regs_set_return_value(fregs, ret) \ regs_set_return_value(&(fregs)->regs, ret) diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h index fbadca645af7..de76c21eb4a3 100644 --- a/arch/s390/include/asm/ftrace.h +++ b/arch/s390/include/asm/ftrace.h @@ -83,7 +83,7 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs, regs_get_kernel_argument(&(fregs)->regs, n) #define ftrace_regs_get_stack_pointer(fregs) \ kernel_stack_pointer(&(fregs)->regs) -#define ftrace_regs_return_value(fregs) \ +#define ftrace_regs_get_return_value(fregs) \ regs_return_value(&(fregs)->regs) #define ftrace_regs_set_return_value(fregs, ret) \ regs_set_return_value(&(fregs)->regs, ret) diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h index 0152a81d9b4a..78f6a200e15b 100644 --- a/arch/x86/include/asm/ftrace.h +++ b/arch/x86/include/asm/ftrace.h @@ -56,7 +56,7 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs) regs_get_kernel_argument(&(fregs)->regs, n) #define ftrace_regs_get_stack_pointer(fregs) \ kernel_stack_pointer(&(fregs)->regs) -#define ftrace_regs_return_value(fregs) \ +#define ftrace_regs_get_return_value(fregs) \ regs_return_value(&(fregs)->regs) #define ftrace_regs_set_return_value(fregs, ret) \ regs_set_return_value(&(fregs)->regs, ret) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 42106b3de396..f84fb9635fb0 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -183,7 +183,7 @@ static __always_inline bool ftrace_regs_has_args(struct ftrace_regs *fregs) regs_get_kernel_argument(ftrace_get_regs(fregs), n) #define ftrace_regs_get_stack_pointer(fregs) \ kernel_stack_pointer(ftrace_get_regs(fregs)) -#define ftrace_regs_return_value(fregs) \ +#define ftrace_regs_get_return_value(fregs) \ regs_return_value(ftrace_get_regs(fregs)) #define ftrace_regs_set_return_value(fregs, ret) \ regs_set_return_value(ftrace_get_regs(fregs), ret)
[PATCH v13 04/20] function_graph: Pass ftrace_regs to entryfunc
From: Masami Hiramatsu (Google) Pass ftrace_regs to the fgraph_ops::entryfunc(). If ftrace_regs is not available, it passes a NULL instead. User callback function can access some registers (including return address) via this ftrace_regs. Signed-off-by: Masami Hiramatsu (Google) --- Changes in v11: - Update for the latest for-next branch. Changes in v8: - Just pass ftrace_regs to the handler instead of adding a new entryregfunc. - Update riscv ftrace_graph_func(). Changes in v3: - Update for new multiple fgraph. --- arch/arm64/kernel/ftrace.c | 20 +++- arch/loongarch/kernel/ftrace_dyn.c | 10 +- arch/powerpc/kernel/trace/ftrace.c |2 + arch/powerpc/kernel/trace/ftrace_64_pg.c | 10 -- arch/riscv/kernel/ftrace.c | 17 ++ arch/x86/kernel/ftrace.c | 50 +- include/linux/ftrace.h | 18 --- kernel/trace/fgraph.c| 23 -- kernel/trace/ftrace.c|3 +- kernel/trace/trace.h |3 +- kernel/trace/trace_functions_graph.c |3 +- kernel/trace/trace_irqsoff.c |3 +- kernel/trace/trace_sched_wakeup.c|3 +- kernel/trace/trace_selftest.c|8 +++-- 14 files changed, 128 insertions(+), 45 deletions(-) diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c index a650f5e11fc5..bc647b725e6a 100644 --- a/arch/arm64/kernel/ftrace.c +++ b/arch/arm64/kernel/ftrace.c @@ -481,7 +481,25 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent, void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op, struct ftrace_regs *fregs) { - prepare_ftrace_return(ip, &fregs->lr, fregs->fp); + unsigned long return_hooker = (unsigned long)&return_to_handler; + unsigned long frame_pointer = fregs->fp; + unsigned long *parent = &fregs->lr; + unsigned long old; + + if (unlikely(atomic_read(¤t->tracing_graph_pause))) + return; + + /* +* Note: +* No protection against faulting at *parent, which may be seen +* on other archs. It's unlikely on AArch64. +*/ + old = *parent; + + if (!function_graph_enter_regs(old, ip, frame_pointer, + (void *)frame_pointer, fregs)) { + *parent = return_hooker; + } } #else /* diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c index bff058317062..966e0f7f7aca 100644 --- a/arch/loongarch/kernel/ftrace_dyn.c +++ b/arch/loongarch/kernel/ftrace_dyn.c @@ -243,8 +243,16 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, { struct pt_regs *regs = &fregs->regs; unsigned long *parent = (unsigned long *)®s->regs[1]; + unsigned long return_hooker = (unsigned long)&return_to_handler; + unsigned long old; + + if (unlikely(atomic_read(¤t->tracing_graph_pause))) + return; + + old = *parent; - prepare_ftrace_return(ip, (unsigned long *)parent); + if (!function_graph_enter_regs(old, ip, 0, parent, fregs)) + *parent = return_hooker; } #else static int ftrace_modify_graph_caller(bool enable) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index d8d6b4fd9a14..a1a0e0b57662 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -434,7 +434,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, if (bit < 0) goto out; - if (!function_graph_enter(parent_ip, ip, 0, (unsigned long *)sp)) + if (!function_graph_enter_regs(parent_ip, ip, 0, (unsigned long *)sp, fregs)) parent_ip = ppc_function_entry(return_to_handler); ftrace_test_recursion_unlock(bit); diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c b/arch/powerpc/kernel/trace/ftrace_64_pg.c index 12fab1803bcf..4ae9eeb1c8f1 100644 --- a/arch/powerpc/kernel/trace/ftrace_64_pg.c +++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c @@ -800,7 +800,8 @@ int ftrace_disable_ftrace_graph_caller(void) * in current thread info. Return the address we want to divert to. */ static unsigned long -__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp) +__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp, + struct ftrace_regs *fregs) { unsigned long return_hooker; int bit; @@ -817,7 +818,7 @@ __prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp return_hooker = ppc_function_entry(return_to_handler); - if (!function_graph_enter(parent, ip, 0, (unsigned long *)sp)) + if (!function_graph_enter_regs(parent, ip, 0, (unsigned long *)sp, freg
[PATCH v13 05/20] function_graph: Replace fgraph_ret_regs with ftrace_regs
From: Masami Hiramatsu (Google) Use ftrace_regs instead of fgraph_ret_regs for tracing return value on function_graph tracer because of simplifying the callback interface. The CONFIG_HAVE_FUNCTION_GRAPH_RETVAL is also replaced by CONFIG_HAVE_FUNCTION_GRAPH_FREGS. Signed-off-by: Masami Hiramatsu (Google) --- Changes in v8: - Newly added. --- arch/arm64/Kconfig |1 + arch/arm64/include/asm/ftrace.h | 23 ++- arch/arm64/kernel/asm-offsets.c | 12 arch/arm64/kernel/entry-ftrace.S| 32 ++-- arch/loongarch/Kconfig |2 +- arch/loongarch/include/asm/ftrace.h | 24 ++-- arch/loongarch/kernel/asm-offsets.c | 12 arch/loongarch/kernel/mcount.S | 17 ++--- arch/loongarch/kernel/mcount_dyn.S | 14 +++--- arch/riscv/Kconfig |2 +- arch/riscv/include/asm/ftrace.h | 26 +- arch/riscv/kernel/mcount.S | 24 +--- arch/s390/Kconfig |2 +- arch/s390/include/asm/ftrace.h | 26 +- arch/s390/kernel/asm-offsets.c |6 -- arch/s390/kernel/mcount.S |9 + arch/x86/Kconfig|2 +- arch/x86/include/asm/ftrace.h | 22 ++ arch/x86/kernel/ftrace_32.S | 15 +-- arch/x86/kernel/ftrace_64.S | 17 + include/linux/ftrace.h | 14 +++--- kernel/trace/Kconfig|4 ++-- kernel/trace/fgraph.c | 21 + 23 files changed, 122 insertions(+), 205 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index a2f8ff354ca6..17947f625b06 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -211,6 +211,7 @@ config ARM64 select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_TRACER select HAVE_FUNCTION_ERROR_INJECTION + select HAVE_FUNCTION_GRAPH_FREGS select HAVE_FUNCTION_GRAPH_TRACER select HAVE_FUNCTION_GRAPH_RETVAL select HAVE_GCC_PLUGINS diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index dc9cf0bd2a4c..dffaab3dd1f1 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -126,6 +126,12 @@ ftrace_override_function_with_return(struct ftrace_regs *fregs) fregs->pc = fregs->lr; } +static __always_inline unsigned long +ftrace_regs_get_frame_pointer(const struct ftrace_regs *fregs) +{ + return fregs->fp; +} + int ftrace_regs_query_register_offset(const char *name); int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec); @@ -183,23 +189,6 @@ static inline bool arch_syscall_match_sym_name(const char *sym, #ifndef __ASSEMBLY__ #ifdef CONFIG_FUNCTION_GRAPH_TRACER -struct fgraph_ret_regs { - /* x0 - x7 */ - unsigned long regs[8]; - - unsigned long fp; - unsigned long __unused; -}; - -static inline unsigned long fgraph_ret_regs_return_value(struct fgraph_ret_regs *ret_regs) -{ - return ret_regs->regs[0]; -} - -static inline unsigned long fgraph_ret_regs_frame_pointer(struct fgraph_ret_regs *ret_regs) -{ - return ret_regs->fp; -} void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent, unsigned long frame_pointer); diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 27de1dddb0ab..9e03c9a7e5c3 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -201,18 +201,6 @@ int main(void) DEFINE(FTRACE_OPS_FUNC, offsetof(struct ftrace_ops, func)); #endif BLANK(); -#ifdef CONFIG_FUNCTION_GRAPH_TRACER - DEFINE(FGRET_REGS_X0,offsetof(struct fgraph_ret_regs, regs[0])); - DEFINE(FGRET_REGS_X1,offsetof(struct fgraph_ret_regs, regs[1])); - DEFINE(FGRET_REGS_X2,offsetof(struct fgraph_ret_regs, regs[2])); - DEFINE(FGRET_REGS_X3,offsetof(struct fgraph_ret_regs, regs[3])); - DEFINE(FGRET_REGS_X4,offsetof(struct fgraph_ret_regs, regs[4])); - DEFINE(FGRET_REGS_X5,offsetof(struct fgraph_ret_regs, regs[5])); - DEFINE(FGRET_REGS_X6,offsetof(struct fgraph_ret_regs, regs[6])); - DEFINE(FGRET_REGS_X7,offsetof(struct fgraph_ret_regs, regs[7])); - DEFINE(FGRET_REGS_FP,offsetof(struct fgraph_ret_regs, fp)); - DEFINE(FGRET_REGS_SIZE, sizeof(struct fgraph_ret_regs)); -#endif #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS DEFINE(FTRACE_OPS_DIRECT_CALL, offsetof(struct ftrace_ops, direct_call)); #endif diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S index f
[PATCH v13 06/20] function_graph: Pass ftrace_regs to retfunc
From: Masami Hiramatsu (Google) Pass ftrace_regs to the fgraph_ops::retfunc(). If ftrace_regs is not available, it passes a NULL instead. User callback function can access some registers (including return address) via this ftrace_regs. Signed-off-by: Masami Hiramatsu (Google) --- Changes in v8: - Pass ftrace_regs to retfunc, instead of adding retregfunc. Changes in v6: - update to use ftrace_regs_get_return_value() because of reordering patches. Changes in v3: - Update for new multiple fgraph. - Save the return address to instruction pointer in ftrace_regs. --- include/linux/ftrace.h |3 ++- kernel/trace/fgraph.c| 16 +++- kernel/trace/ftrace.c|3 ++- kernel/trace/trace.h |3 ++- kernel/trace/trace_functions_graph.c |7 --- kernel/trace/trace_irqsoff.c |3 ++- kernel/trace/trace_sched_wakeup.c|3 ++- kernel/trace/trace_selftest.c|3 ++- 8 files changed, 27 insertions(+), 14 deletions(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 13987cd63553..e7c41d9988e1 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -1069,7 +1069,8 @@ struct fgraph_ops; /* Type of the callback handlers for tracing function graph*/ typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *, - struct fgraph_ops *); /* return */ + struct fgraph_ops *, + struct ftrace_regs *); /* return */ typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *, struct fgraph_ops *, struct ftrace_regs *); /* entry */ diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index 30bebe43607d..6a3e2db16aa4 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -297,7 +297,8 @@ static int entry_run(struct ftrace_graph_ent *trace, struct fgraph_ops *ops, } /* ftrace_graph_return set to this to tell some archs to run function graph */ -static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops) +static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops, + struct ftrace_regs *fregs) { } @@ -491,7 +492,8 @@ int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, } static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace, - struct fgraph_ops *gops) + struct fgraph_ops *gops, + struct ftrace_regs *fregs) { } @@ -787,6 +789,9 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe } trace.rettime = trace_clock_local(); + if (fregs) + ftrace_regs_set_instruction_pointer(fregs, ret); + #ifdef CONFIG_FUNCTION_GRAPH_RETVAL trace.retval = ftrace_regs_get_return_value(fregs); #endif @@ -796,7 +801,7 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe #ifdef CONFIG_HAVE_STATIC_CALL if (static_branch_likely(&fgraph_do_direct)) { if (test_bit(fgraph_direct_gops->idx, &bitmap)) - static_call(fgraph_retfunc)(&trace, fgraph_direct_gops); + static_call(fgraph_retfunc)(&trace, fgraph_direct_gops, fregs); } else #endif { @@ -806,7 +811,7 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe if (gops == &fgraph_stub) continue; - gops->retfunc(&trace, gops); + gops->retfunc(&trace, gops, fregs); } } @@ -956,7 +961,8 @@ void ftrace_graph_sleep_time_control(bool enable) * Simply points to ftrace_stub, but with the proper protocol. * Defined by the linker script in linux/vmlinux.lds.h */ -void ftrace_stub_graph(struct ftrace_graph_ret *trace, struct fgraph_ops *gops); +void ftrace_stub_graph(struct ftrace_graph_ret *trace, struct fgraph_ops *gops, + struct ftrace_regs *fregs); /* The callbacks that hook a function */ trace_func_graph_ret_t ftrace_graph_return = ftrace_stub_graph; diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 775040a9f541..fd6c5a50c5e5 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -840,7 +840,8 @@ static int profile_graph_entry(struct ftrace_graph_ent *trace, } static void profile_graph_return(struct ftrace_graph_ret *trace, -struct fgraph_ops *gops) +struct fgraph_ops *gops, +struct ftrace_regs *fregs) { struct ftrace_ret_stack *ret_stack; struct ftrace_profile_stat *stat; diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 28d8ad5e31e6..
[PATCH v13 07/20] fprobe: Use ftrace_regs in fprobe entry handler
From: Masami Hiramatsu (Google) This allows fprobes to be available with CONFIG_DYNAMIC_FTRACE_WITH_ARGS instead of CONFIG_DYNAMIC_FTRACE_WITH_REGS, then we can enable fprobe on arm64. Signed-off-by: Masami Hiramatsu (Google) Acked-by: Florent Revest --- Changes in v6: - Keep using SAVE_REGS flag to avoid breaking bpf kprobe-multi test. --- include/linux/fprobe.h |2 +- kernel/trace/Kconfig|3 ++- kernel/trace/bpf_trace.c| 10 +++--- kernel/trace/fprobe.c |3 ++- kernel/trace/trace_fprobe.c |6 +- lib/test_fprobe.c |4 ++-- samples/fprobe/fprobe_example.c |2 +- 7 files changed, 20 insertions(+), 10 deletions(-) diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h index f39869588117..ca64ee5e45d2 100644 --- a/include/linux/fprobe.h +++ b/include/linux/fprobe.h @@ -10,7 +10,7 @@ struct fprobe; typedef int (*fprobe_entry_cb)(struct fprobe *fp, unsigned long entry_ip, - unsigned long ret_ip, struct pt_regs *regs, + unsigned long ret_ip, struct ftrace_regs *regs, void *entry_data); typedef void (*fprobe_exit_cb)(struct fprobe *fp, unsigned long entry_ip, diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index ab277eff80dc..0c6a03554c13 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -287,7 +287,7 @@ config DYNAMIC_FTRACE_WITH_ARGS config FPROBE bool "Kernel Function Probe (fprobe)" depends on FUNCTION_TRACER - depends on DYNAMIC_FTRACE_WITH_REGS + depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS depends on HAVE_RETHOOK select RETHOOK default n @@ -672,6 +672,7 @@ config FPROBE_EVENTS select TRACING select PROBE_EVENTS select DYNAMIC_EVENTS + depends on DYNAMIC_FTRACE_WITH_REGS default y help This allows user to add tracing events on the function entry and diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index cd098846e251..947808a002d0 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2605,7 +2605,7 @@ struct bpf_session_run_ctx { void *data; }; -#ifdef CONFIG_FPROBE +#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS) struct bpf_kprobe_multi_link { struct bpf_link link; struct fprobe fp; @@ -2857,12 +2857,16 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, static int kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip, - unsigned long ret_ip, struct pt_regs *regs, + unsigned long ret_ip, struct ftrace_regs *fregs, void *data) { + struct pt_regs *regs = ftrace_get_regs(fregs); struct bpf_kprobe_multi_link *link; int err; + if (!regs) + return 0; + link = container_of(fp, struct bpf_kprobe_multi_link, fp); err = kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs, false, data); return is_kprobe_session(link->link.prog) ? err : 0; @@ -3137,7 +3141,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr kvfree(cookies); return err; } -#else /* !CONFIG_FPROBE */ +#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) { return -EOPNOTSUPP; diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c index 9ff018245840..3d3789283873 100644 --- a/kernel/trace/fprobe.c +++ b/kernel/trace/fprobe.c @@ -46,7 +46,7 @@ static inline void __fprobe_handler(unsigned long ip, unsigned long parent_ip, } if (fp->entry_handler) - ret = fp->entry_handler(fp, ip, parent_ip, ftrace_get_regs(fregs), entry_data); + ret = fp->entry_handler(fp, ip, parent_ip, fregs, entry_data); /* If entry_handler returns !0, nmissed is not counted. */ if (rh) { @@ -182,6 +182,7 @@ static void fprobe_init(struct fprobe *fp) fp->ops.func = fprobe_kprobe_handler; else fp->ops.func = fprobe_handler; + fp->ops.flags |= FTRACE_OPS_FL_SAVE_REGS; } diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c index 62e6a8f4aae9..b2c20d4fdfd7 100644 --- a/kernel/trace/trace_fprobe.c +++ b/kernel/trace/trace_fprobe.c @@ -338,12 +338,16 @@ NOKPROBE_SYMBOL(fexit_perf_func); #endif /* CONFIG_PERF_EVENTS */ static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip, -unsigned long ret_ip, struct pt_regs *regs, +unsigned long ret_ip, struct ftrace_regs *fregs, void *entry_data) { struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp); + struct pt_reg
[PATCH v13 08/20] fprobe: Use ftrace_regs in fprobe exit handler
From: Masami Hiramatsu (Google) Change the fprobe exit handler to use ftrace_regs structure instead of pt_regs. This also introduce HAVE_PT_REGS_TO_FTRACE_REGS_CAST which means the ftrace_regs's memory layout is equal to the pt_regs so that those are able to cast. Fprobe introduces a new dependency with that. Signed-off-by: Masami Hiramatsu (Google) --- Changes in v3: - Use ftrace_regs_get_return_value() Changes from previous series: NOTHING, just forward ported. --- arch/loongarch/Kconfig |1 + arch/s390/Kconfig |1 + arch/x86/Kconfig|1 + include/linux/fprobe.h |2 +- include/linux/ftrace.h |6 ++ kernel/trace/Kconfig|8 kernel/trace/bpf_trace.c|6 +- kernel/trace/fprobe.c |3 ++- kernel/trace/trace_fprobe.c |6 +- lib/test_fprobe.c |6 +++--- samples/fprobe/fprobe_example.c |2 +- 11 files changed, 34 insertions(+), 8 deletions(-) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 974f08f65f63..73cb657496c8 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -122,6 +122,7 @@ config LOONGARCH select HAVE_DMA_CONTIGUOUS select HAVE_DYNAMIC_FTRACE select HAVE_DYNAMIC_FTRACE_WITH_ARGS + select HAVE_PT_REGS_TO_FTRACE_REGS_CAST select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS select HAVE_DYNAMIC_FTRACE_WITH_REGS select HAVE_EBPF_JIT diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 12e942cfbcde..eebc299f424b 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -175,6 +175,7 @@ config S390 select HAVE_DMA_CONTIGUOUS select HAVE_DYNAMIC_FTRACE select HAVE_DYNAMIC_FTRACE_WITH_ARGS + select HAVE_PT_REGS_TO_FTRACE_REGS_CAST select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS select HAVE_DYNAMIC_FTRACE_WITH_REGS select HAVE_EBPF_JIT if HAVE_MARCH_Z196_FEATURES diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 047384e4d93a..59788d8b220e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -218,6 +218,7 @@ config X86 select HAVE_DYNAMIC_FTRACE select HAVE_DYNAMIC_FTRACE_WITH_REGS select HAVE_DYNAMIC_FTRACE_WITH_ARGSif X86_64 + select HAVE_PT_REGS_TO_FTRACE_REGS_CAST if X86_64 select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS select HAVE_SAMPLE_FTRACE_DIRECTif X86_64 select HAVE_SAMPLE_FTRACE_DIRECT_MULTI if X86_64 diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h index ca64ee5e45d2..ef609bcca0f9 100644 --- a/include/linux/fprobe.h +++ b/include/linux/fprobe.h @@ -14,7 +14,7 @@ typedef int (*fprobe_entry_cb)(struct fprobe *fp, unsigned long entry_ip, void *entry_data); typedef void (*fprobe_exit_cb)(struct fprobe *fp, unsigned long entry_ip, - unsigned long ret_ip, struct pt_regs *regs, + unsigned long ret_ip, struct ftrace_regs *regs, void *entry_data); /** diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index e7c41d9988e1..a1b2ef492c7f 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -162,6 +162,12 @@ struct ftrace_regs { #define ftrace_regs_set_instruction_pointer(fregs, ip) do { } while (0) #endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */ +#ifdef CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST + +static_assert(sizeof(struct pt_regs) == sizeof(struct ftrace_regs)); + +#endif /* CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */ + static __always_inline struct pt_regs *ftrace_get_regs(struct ftrace_regs *fregs) { if (!fregs) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 0c6a03554c13..2e2b39699542 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -57,6 +57,13 @@ config HAVE_DYNAMIC_FTRACE_WITH_ARGS This allows for use of ftrace_regs_get_argument() and ftrace_regs_get_stack_pointer(). +config HAVE_PT_REGS_TO_FTRACE_REGS_CAST + bool + help +If this is set, the memory layout of the ftrace_regs data structure +is the same as the pt_regs. So the pt_regs is possible to be casted +to ftrace_regs. + config HAVE_DYNAMIC_FTRACE_NO_PATCHABLE bool help @@ -288,6 +295,7 @@ config FPROBE bool "Kernel Function Probe (fprobe)" depends on FUNCTION_TRACER depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS + depends on HAVE_PT_REGS_TO_FTRACE_REGS_CAST || !HAVE_DYNAMIC_FTRACE_WITH_ARGS depends on HAVE_RETHOOK select RETHOOK default n diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 947808a002d0..cdba9981b048 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2874,10 +2874,14 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip, static void kprobe_m
[PATCH v13 09/20] tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
From: Masami Hiramatsu (Google) Add ftrace_partial_regs() which converts the ftrace_regs to pt_regs. This is for the eBPF which needs this to keep the same pt_regs interface to access registers. Thus when replacing the pt_regs with ftrace_regs in fprobes (which is used by kprobe_multi eBPF event), this will be used. If the architecture defines its own ftrace_regs, this copies partial registers to pt_regs and returns it. If not, ftrace_regs is the same as pt_regs and ftrace_partial_regs() will return ftrace_regs::regs. Signed-off-by: Masami Hiramatsu (Google) Acked-by: Florent Revest --- Changes in v8: - Add the reason why this required in changelog. Changes from previous series: NOTHING, just forward ported. --- arch/arm64/include/asm/ftrace.h | 11 +++ include/linux/ftrace.h | 17 + 2 files changed, 28 insertions(+) diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index dffaab3dd1f1..5cd587afab6d 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -132,6 +132,17 @@ ftrace_regs_get_frame_pointer(const struct ftrace_regs *fregs) return fregs->fp; } +static __always_inline struct pt_regs * +ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs *regs) +{ + memcpy(regs->regs, fregs->regs, sizeof(u64) * 9); + regs->sp = fregs->sp; + regs->pc = fregs->pc; + regs->regs[29] = fregs->fp; + regs->regs[30] = fregs->lr; + return regs; +} + int ftrace_regs_query_register_offset(const char *name); int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec); diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index a1b2ef492c7f..bd9a26bdf660 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -176,6 +176,23 @@ static __always_inline struct pt_regs *ftrace_get_regs(struct ftrace_regs *fregs return arch_ftrace_get_regs(fregs); } +#if !defined(CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS) || \ + defined(CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST) + +static __always_inline struct pt_regs * +ftrace_partial_regs(struct ftrace_regs *fregs, struct pt_regs *regs) +{ + /* +* If CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST=y, ftrace_regs memory +* layout is the same as pt_regs. So always returns that address. +* Since arch_ftrace_get_regs() will check some members and may return +* NULL, we can not use it. +*/ + return &fregs->regs; +} + +#endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */ + /* * When true, the ftrace_regs_{get,set}_*() functions may be used on fregs. * Note: this can be true even when ftrace_get_regs() cannot provide a pt_regs.
[PATCH v13 10/20] tracing: Add ftrace_fill_perf_regs() for perf event
From: Masami Hiramatsu (Google) Add ftrace_fill_perf_regs() which should be compatible with the perf_fetch_caller_regs(). In other words, the pt_regs returned from the ftrace_fill_perf_regs() must satisfy 'user_mode(regs) == false' and can be used for stack tracing. Signed-off-by: Masami Hiramatsu (Google) --- Changes from previous series: NOTHING, just forward ported. --- arch/arm64/include/asm/ftrace.h |7 +++ arch/powerpc/include/asm/ftrace.h |7 +++ arch/s390/include/asm/ftrace.h|5 + arch/x86/include/asm/ftrace.h |7 +++ include/linux/ftrace.h| 31 +++ 5 files changed, 57 insertions(+) diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index 5cd587afab6d..14ecb9a418d9 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -143,6 +143,13 @@ ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs *regs) return regs; } +#define arch_ftrace_fill_perf_regs(fregs, _regs) do { \ + (_regs)->pc = (fregs)->pc; \ + (_regs)->regs[29] = (fregs)->fp;\ + (_regs)->sp = (fregs)->sp; \ + (_regs)->pstate = PSR_MODE_EL1h;\ + } while (0) + int ftrace_regs_query_register_offset(const char *name); int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec); diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h index 23d26f3afae4..e6ff6834bf7e 100644 --- a/arch/powerpc/include/asm/ftrace.h +++ b/arch/powerpc/include/asm/ftrace.h @@ -42,6 +42,13 @@ static __always_inline struct pt_regs *arch_ftrace_get_regs(struct ftrace_regs * return fregs->regs.msr ? &fregs->regs : NULL; } +#define arch_ftrace_fill_perf_regs(fregs, _regs) do { \ + (_regs)->result = 0;\ + (_regs)->nip = (fregs)->regs.nip; \ + (_regs)->gpr[1] = (fregs)->regs.gpr[1]; \ + asm volatile("mfmsr %0" : "=r" ((_regs)->msr)); \ + } while (0) + static __always_inline void ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs, unsigned long ip) diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h index 9cdd48a46bf7..0d9f6df21f81 100644 --- a/arch/s390/include/asm/ftrace.h +++ b/arch/s390/include/asm/ftrace.h @@ -84,6 +84,11 @@ ftrace_regs_get_frame_pointer(struct ftrace_regs *fregs) return sp[0]; /* return backchain */ } +#define arch_ftrace_fill_perf_regs(fregs, _regs)do { \ + (_regs)->psw.addr = (fregs)->regs.psw.addr; \ + (_regs)->gprs[15] = (fregs)->regs.gprs[15]; \ + } while (0) + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS /* * When an ftrace registered caller is tracing a function that is diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h index 669771ef3b5b..1f4d1f7b19ed 100644 --- a/arch/x86/include/asm/ftrace.h +++ b/arch/x86/include/asm/ftrace.h @@ -46,6 +46,13 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs) return &fregs->regs; } +#define arch_ftrace_fill_perf_regs(fregs, _regs) do { \ + (_regs)->ip = (fregs)->regs.ip; \ + (_regs)->sp = (fregs)->regs.sp; \ + (_regs)->cs = __KERNEL_CS; \ + (_regs)->flags = 0; \ + } while (0) + #define ftrace_regs_set_instruction_pointer(fregs, _ip)\ do { (fregs)->regs.ip = (_ip); } while (0) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index bd9a26bdf660..3edf2427ae73 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -193,6 +193,37 @@ ftrace_partial_regs(struct ftrace_regs *fregs, struct pt_regs *regs) #endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */ +#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS + +/* + * Please define arch dependent pt_regs which compatible to the + * perf_arch_fetch_caller_regs() but based on ftrace_regs. + * This requires + * - user_mode(_regs) returns false (always kernel mode). + * - able to use the _regs for stack trace. + */ +#ifndef arch_ftrace_fill_perf_regs +/* As same as perf_arch_fetch_caller_regs(), do nothing by default */ +#define arch_ftrace_fill_perf_regs(fregs, _regs) do {} while (0) +#endif + +static __always_inline struct pt_regs * +ftrace_fill_perf_regs(struct ftrace_regs *fregs, struct pt_regs *regs) +{ + arch_ftrace_fill_perf_regs(fregs, regs); + return regs; +} + +#else /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */ + +static __always_inline struct pt_regs * +ftrace_fill_perf_regs(struct ftrace_regs *fregs, struct pt_regs *regs) +{ + return &fregs->regs; +} + +#endif + /*
[PATCH v13 11/20] tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
From: Masami Hiramatsu (Google) Allow fprobe events to be enabled with CONFIG_DYNAMIC_FTRACE_WITH_ARGS. With this change, fprobe events mostly use ftrace_regs instead of pt_regs. Note that if the arch doesn't enable HAVE_PT_REGS_COMPAT_FTRACE_REGS, fprobe events will not be able to be used from perf. Signed-off-by: Masami Hiramatsu (Google) --- Changes in v9: - Copy store_trace_entry_data() as store_fprobe_entry_data() for fprobe. Chagnes in v3: - Use ftrace_regs_get_return_value(). Changes in v2: - Define ftrace_regs_get_kernel_stack_nth() for !CONFIG_HAVE_REGS_AND_STACK_ACCESS_API. Changes from previous series: Update against the new series. --- include/linux/ftrace.h | 17 ++ kernel/trace/Kconfig|1 kernel/trace/trace_fprobe.c | 107 +-- kernel/trace/trace_probe_tmpl.h |2 - 4 files changed, 86 insertions(+), 41 deletions(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 3edf2427ae73..63fb91088a23 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -255,6 +255,23 @@ static __always_inline bool ftrace_regs_has_args(struct ftrace_regs *fregs) frame_pointer(&(fregs)->regs) #endif +#ifdef CONFIG_HAVE_REGS_AND_STACK_ACCESS_API +static __always_inline unsigned long +ftrace_regs_get_kernel_stack_nth(struct ftrace_regs *fregs, unsigned int nth) +{ + unsigned long *stackp; + + stackp = (unsigned long *)ftrace_regs_get_stack_pointer(fregs); + if (((unsigned long)(stackp + nth) & ~(THREAD_SIZE - 1)) == + ((unsigned long)stackp & ~(THREAD_SIZE - 1))) + return *(stackp + nth); + + return 0; +} +#else /* !CONFIG_HAVE_REGS_AND_STACK_ACCESS_API */ +#define ftrace_regs_get_kernel_stack_nth(fregs, nth) (0L) +#endif /* CONFIG_HAVE_REGS_AND_STACK_ACCESS_API */ + typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op, struct ftrace_regs *fregs); diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 2e2b39699542..0fc4c3129c19 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -680,7 +680,6 @@ config FPROBE_EVENTS select TRACING select PROBE_EVENTS select DYNAMIC_EVENTS - depends on DYNAMIC_FTRACE_WITH_REGS default y help This allows user to add tracing events on the function entry and diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c index 273cdf3cf70c..86cd6a8c806a 100644 --- a/kernel/trace/trace_fprobe.c +++ b/kernel/trace/trace_fprobe.c @@ -133,7 +133,7 @@ static int process_fetch_insn(struct fetch_insn *code, void *rec, void *edata, void *dest, void *base) { - struct pt_regs *regs = rec; + struct ftrace_regs *fregs = rec; unsigned long val; int ret; @@ -141,17 +141,17 @@ process_fetch_insn(struct fetch_insn *code, void *rec, void *edata, /* 1st stage: get value from context */ switch (code->op) { case FETCH_OP_STACK: - val = regs_get_kernel_stack_nth(regs, code->param); + val = ftrace_regs_get_kernel_stack_nth(fregs, code->param); break; case FETCH_OP_STACKP: - val = kernel_stack_pointer(regs); + val = ftrace_regs_get_stack_pointer(fregs); break; case FETCH_OP_RETVAL: - val = regs_return_value(regs); + val = ftrace_regs_get_return_value(fregs); break; #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API case FETCH_OP_ARG: - val = regs_get_kernel_argument(regs, code->param); + val = ftrace_regs_get_argument(fregs, code->param); break; case FETCH_OP_EDATA: val = *(unsigned long *)((unsigned long)edata + code->offset); @@ -174,7 +174,7 @@ NOKPROBE_SYMBOL(process_fetch_insn) /* function entry handler */ static nokprobe_inline void __fentry_trace_func(struct trace_fprobe *tf, unsigned long entry_ip, - struct pt_regs *regs, + struct ftrace_regs *fregs, struct trace_event_file *trace_file) { struct fentry_trace_entry_head *entry; @@ -188,41 +188,71 @@ __fentry_trace_func(struct trace_fprobe *tf, unsigned long entry_ip, if (trace_trigger_soft_disabled(trace_file)) return; - dsize = __get_data_size(&tf->tp, regs, NULL); + dsize = __get_data_size(&tf->tp, fregs, NULL); entry = trace_event_buffer_reserve(&fbuffer, trace_file, sizeof(*entry) + tf->tp.size + dsize); if (!entry) return; - fbuffer.regs = regs; + fbuffer.regs = ftrace_get_regs(fregs); entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event); entry->ip = entry_ip; - store_trace_args(&entry[1], &tf->t
[PATCH v13 12/20] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
From: Masami Hiramatsu (Google) Enable kprobe_multi feature if CONFIG_FPROBE is enabled. The pt_regs is converted from ftrace_regs by ftrace_partial_regs(), thus some registers may always returns 0. But it should be enough for function entry (access arguments) and exit (access return value). Signed-off-by: Masami Hiramatsu (Google) Acked-by: Florent Revest --- Changes in v9: - Avoid wasting memory for bpf_kprobe_multi_pt_regs when CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST=y --- kernel/trace/bpf_trace.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index cdba9981b048..deb629f4a510 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2605,7 +2605,7 @@ struct bpf_session_run_ctx { void *data; }; -#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS) +#ifdef CONFIG_FPROBE struct bpf_kprobe_multi_link { struct bpf_link link; struct fprobe fp; @@ -2628,6 +2628,13 @@ struct user_syms { char *buf; }; +#ifndef CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST +static DEFINE_PER_CPU(struct pt_regs, bpf_kprobe_multi_pt_regs); +#define bpf_kprobe_multi_pt_regs_ptr() this_cpu_ptr(&bpf_kprobe_multi_pt_regs) +#else +#define bpf_kprobe_multi_pt_regs_ptr() (NULL) +#endif + static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, u32 cnt) { unsigned long __user usymbol; @@ -2822,7 +2829,7 @@ static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx) static int kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, - unsigned long entry_ip, struct pt_regs *regs, + unsigned long entry_ip, struct ftrace_regs *fregs, bool is_return, void *data) { struct bpf_kprobe_multi_run_ctx run_ctx = { @@ -2834,6 +2841,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, .entry_ip = entry_ip, }; struct bpf_run_ctx *old_run_ctx; + struct pt_regs *regs; int err; if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) { @@ -2844,6 +2852,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, migrate_disable(); rcu_read_lock(); + regs = ftrace_partial_regs(fregs, bpf_kprobe_multi_pt_regs_ptr()); old_run_ctx = bpf_set_run_ctx(&run_ctx.session_ctx.run_ctx); err = bpf_prog_run(link->link.prog, regs); bpf_reset_run_ctx(old_run_ctx); @@ -2860,15 +2869,11 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip, unsigned long ret_ip, struct ftrace_regs *fregs, void *data) { - struct pt_regs *regs = ftrace_get_regs(fregs); struct bpf_kprobe_multi_link *link; int err; - if (!regs) - return 0; - link = container_of(fp, struct bpf_kprobe_multi_link, fp); - err = kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs, false, data); + err = kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs, false, data); return is_kprobe_session(link->link.prog) ? err : 0; } @@ -2878,13 +2883,9 @@ kprobe_multi_link_exit_handler(struct fprobe *fp, unsigned long fentry_ip, void *data) { struct bpf_kprobe_multi_link *link; - struct pt_regs *regs = ftrace_get_regs(fregs); - - if (!regs) - return; link = container_of(fp, struct bpf_kprobe_multi_link, fp); - kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs, true, data); + kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs, true, data); } static int symbols_cmp_r(const void *a, const void *b, const void *priv) @@ -3145,7 +3146,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr kvfree(cookies); return err; } -#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */ +#else /* !CONFIG_FPROBE */ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) { return -EOPNOTSUPP;
[PATCH v13 13/20] ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
From: Masami Hiramatsu (Google) Add CONFIG_HAVE_FTRACE_GRAPH_FUNC kconfig in addition to ftrace_graph_func macro check. This is for the other feature (e.g. FPROBE) which requires to access ftrace_regs from fgraph_ops::entryfunc() can avoid compiling if the fgraph can not pass the valid ftrace_regs. Signed-off-by: Masami Hiramatsu (Google) --- Changes in v8: - Newly added. --- arch/arm64/Kconfig |1 + arch/loongarch/Kconfig |1 + arch/powerpc/Kconfig |1 + arch/riscv/Kconfig |1 + arch/x86/Kconfig |1 + kernel/trace/Kconfig |5 + 6 files changed, 10 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 17947f625b06..53eb9f36842d 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -208,6 +208,7 @@ config ARM64 select HAVE_SAMPLE_FTRACE_DIRECT_MULTI select HAVE_EFFICIENT_UNALIGNED_ACCESS select HAVE_GUP_FAST + select HAVE_FTRACE_GRAPH_FUNC select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_TRACER select HAVE_FUNCTION_ERROR_INJECTION diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 73cb657496c8..9f7adca388ec 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -129,6 +129,7 @@ config LOONGARCH select HAVE_EFFICIENT_UNALIGNED_ACCESS if !ARCH_STRICT_ALIGN select HAVE_EXIT_THREAD select HAVE_GUP_FAST + select HAVE_FTRACE_GRAPH_FUNC select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_ARG_ACCESS_API select HAVE_FUNCTION_ERROR_INJECTION diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index d7b09b064a8a..aa2669f5b314 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -238,6 +238,7 @@ config PPC select HAVE_EBPF_JIT select HAVE_EFFICIENT_UNALIGNED_ACCESS select HAVE_GUP_FAST + select HAVE_FTRACE_GRAPH_FUNC select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_ARG_ACCESS_API select HAVE_FUNCTION_DESCRIPTORSif PPC64_ELF_ABI_V1 diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 6e8422269ba4..8f05e9fb7803 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -138,6 +138,7 @@ config RISCV select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && (CLANG_SUPPORTS_DYNAMIC_FTRACE || GCC_SUPPORTS_DYNAMIC_FTRACE) select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS select HAVE_DYNAMIC_FTRACE_WITH_ARGS if HAVE_DYNAMIC_FTRACE + select HAVE_FTRACE_GRAPH_FUNC select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL select HAVE_FUNCTION_GRAPH_TRACER select HAVE_FUNCTION_GRAPH_FREGS diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 59788d8b220e..02863509ebd1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -228,6 +228,7 @@ config X86 select HAVE_EXIT_THREAD select HAVE_GUP_FAST select HAVE_FENTRY if X86_64 || DYNAMIC_FTRACE + select HAVE_FTRACE_GRAPH_FUNC if HAVE_FUNCTION_GRAPH_TRACER select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_GRAPH_FREGSif HAVE_FUNCTION_GRAPH_TRACER select HAVE_FUNCTION_GRAPH_TRACER if X86_32 || (X86_64 && DYNAMIC_FTRACE) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 0fc4c3129c19..c8dfd3a233c6 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -34,6 +34,11 @@ config HAVE_FUNCTION_GRAPH_TRACER config HAVE_FUNCTION_GRAPH_FREGS bool +config HAVE_FTRACE_GRAPH_FUNC + bool + help + True if ftrace_graph_func() is defined. + config HAVE_DYNAMIC_FTRACE bool help
[PATCH v13 14/20] fprobe: Rewrite fprobe on function-graph tracer
From: Masami Hiramatsu (Google) Rewrite fprobe implementation on function-graph tracer. Major API changes are: - 'nr_maxactive' field is deprecated. - This depends on CONFIG_DYNAMIC_FTRACE_WITH_ARGS or !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and CONFIG_HAVE_FUNCTION_GRAPH_FREGS. So currently works only on x86_64. - Currently the entry size is limited in 15 * sizeof(long). - If there is too many fprobe exit handler set on the same function, it will fail to probe. Signed-off-by: Masami Hiramatsu (Google) --- Changes in v12: - Skip updating ftrace hash if not required. Changes in v9: - Remove unneeded prototype of ftrace_regs_get_return_address(). - Fix entry data address calculation. - Remove DIV_ROUND_UP() from hotpath. Changes in v8: - Use trace_func_graph_ret/ent_t for fgraph_ops. - Update CONFIG_FPROBE dependencies. - Add ftrace_regs_get_return_address() for each arch. Changes in v3: - Update for new reserve_data/retrieve_data API. - Fix internal push/pop on fgraph data logic so that it can correctly save/restore the returning fprobes. Changes in v2: - Add more lockdep_assert_held(fprobe_mutex) - Use READ_ONCE() and WRITE_ONCE() for fprobe_hlist_node::fp. - Add NOKPROBE_SYMBOL() for the functions which is called from entry/exit callback. --- arch/arm64/include/asm/ftrace.h |6 arch/loongarch/include/asm/ftrace.h |6 arch/powerpc/include/asm/ftrace.h |6 arch/s390/include/asm/ftrace.h |6 arch/x86/include/asm/ftrace.h |6 include/linux/fprobe.h | 53 ++- kernel/trace/Kconfig|8 kernel/trace/fprobe.c | 639 +-- lib/test_fprobe.c | 45 -- 9 files changed, 530 insertions(+), 245 deletions(-) diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index 14ecb9a418d9..27e32f323048 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -132,6 +132,12 @@ ftrace_regs_get_frame_pointer(const struct ftrace_regs *fregs) return fregs->fp; } +static __always_inline unsigned long +ftrace_regs_get_return_address(const struct ftrace_regs *fregs) +{ + return fregs->lr; +} + static __always_inline struct pt_regs * ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs *regs) { diff --git a/arch/loongarch/include/asm/ftrace.h b/arch/loongarch/include/asm/ftrace.h index 1a73f35ea9af..c021aa3194f3 100644 --- a/arch/loongarch/include/asm/ftrace.h +++ b/arch/loongarch/include/asm/ftrace.h @@ -80,6 +80,12 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs, unsigned long ip) #define ftrace_regs_get_frame_pointer(fregs) \ ((fregs)->regs.regs[22]) +static __always_inline unsigned long +ftrace_regs_get_return_address(struct ftrace_regs *fregs) +{ + return *(unsigned long *)(fregs->regs.regs[1]); +} + #define ftrace_graph_func ftrace_graph_func void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op, struct ftrace_regs *fregs); diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h index e6ff6834bf7e..2a2d070dd23c 100644 --- a/arch/powerpc/include/asm/ftrace.h +++ b/arch/powerpc/include/asm/ftrace.h @@ -75,6 +75,12 @@ ftrace_regs_get_instruction_pointer(struct ftrace_regs *fregs) #define ftrace_regs_query_register_offset(name) \ regs_query_register_offset(name) +static __always_inline unsigned long +ftrace_regs_get_return_address(struct ftrace_regs *fregs) +{ + return fregs->regs.link; +} + struct ftrace_ops; #define ftrace_graph_func ftrace_graph_func diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h index 0d9f6df21f81..7b80ff4d3386 100644 --- a/arch/s390/include/asm/ftrace.h +++ b/arch/s390/include/asm/ftrace.h @@ -84,6 +84,12 @@ ftrace_regs_get_frame_pointer(struct ftrace_regs *fregs) return sp[0]; /* return backchain */ } +static __always_inline unsigned long +ftrace_regs_get_return_address(const struct ftrace_regs *fregs) +{ + return fregs->regs.gprs[14]; +} + #define arch_ftrace_fill_perf_regs(fregs, _regs)do { \ (_regs)->psw.addr = (fregs)->regs.psw.addr; \ (_regs)->gprs[15] = (fregs)->regs.gprs[15]; \ diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h index 1f4d1f7b19ed..8472ba394091 100644 --- a/arch/x86/include/asm/ftrace.h +++ b/arch/x86/include/asm/ftrace.h @@ -74,6 +74,12 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs) #define ftrace_regs_get_frame_pointer(fregs) \ frame_pointer(&(fregs)->regs) +static __always_inline unsigned long +ftrace_regs_get_return_address(struct ftrace_regs *fregs) +{ + return *(unsigned long *)ftrace_regs_get_stack_pointer(fregs); +} + struct ftrace_ops; #define ftrace_graph_func ftrace_graph_func v
[PATCH v13 15/20] tracing: Fix function timing profiler to initialize hashtable
From: Masami Hiramatsu (Google) Since the new fgraph requires to initialize fgraph_ops.ops.func_hash before calling register_ftrace_graph(), initialize it with default (tracing all functions) parameter. Signed-off-by: Masami Hiramatsu (Google) --- kernel/trace/ftrace.c |4 1 file changed, 4 insertions(+) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index fd6c5a50c5e5..c55cf21fd53c 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -885,6 +885,10 @@ static void profile_graph_return(struct ftrace_graph_ret *trace, } static struct fgraph_ops fprofiler_ops = { + .ops = { + .flags = FTRACE_OPS_FL_INITIALIZED, + INIT_OPS_HASH(fprofiler_ops.ops) + }, .entryfunc = &profile_graph_entry, .retfunc = &profile_graph_return, };
[PATCH v13 16/20] tracing/fprobe: Remove nr_maxactive from fprobe
From: Masami Hiramatsu (Google) Remove depercated fprobe::nr_maxactive. This involves fprobe events to rejects the maxactive number. Signed-off-by: Masami Hiramatsu (Google) --- Changes in v2: - Newly added. --- include/linux/fprobe.h |2 -- kernel/trace/trace_fprobe.c | 44 ++- 2 files changed, 6 insertions(+), 40 deletions(-) diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h index 2d06bbd99601..a86b3e4df2a0 100644 --- a/include/linux/fprobe.h +++ b/include/linux/fprobe.h @@ -54,7 +54,6 @@ struct fprobe_hlist { * @nmissed: The counter for missing events. * @flags: The status flag. * @entry_data_size: The private data storage size. - * @nr_maxactive: The max number of active functions. (*deprecated) * @entry_handler: The callback function for function entry. * @exit_handler: The callback function for function exit. * @hlist_array: The fprobe_hlist for fprobe search from IP hash table. @@ -63,7 +62,6 @@ struct fprobe { unsigned long nmissed; unsigned intflags; size_t entry_data_size; - int nr_maxactive; fprobe_entry_cb entry_handler; fprobe_exit_cb exit_handler; diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c index 86cd6a8c806a..20ef5cd5d419 100644 --- a/kernel/trace/trace_fprobe.c +++ b/kernel/trace/trace_fprobe.c @@ -422,7 +422,6 @@ static struct trace_fprobe *alloc_trace_fprobe(const char *group, const char *event, const char *symbol, struct tracepoint *tpoint, - int maxactive, int nargs, bool is_return) { struct trace_fprobe *tf; @@ -442,7 +441,6 @@ static struct trace_fprobe *alloc_trace_fprobe(const char *group, tf->fp.entry_handler = fentry_dispatcher; tf->tpoint = tpoint; - tf->fp.nr_maxactive = maxactive; ret = trace_probe_init(&tf->tp, event, group, false, nargs); if (ret < 0) @@ -1021,12 +1019,11 @@ static int __trace_fprobe_create(int argc, const char *argv[]) * FETCHARG:TYPE : use TYPE instead of unsigned long. */ struct trace_fprobe *tf = NULL; - int i, len, new_argc = 0, ret = 0; + int i, new_argc = 0, ret = 0; bool is_return = false; char *symbol = NULL; const char *event = NULL, *group = FPROBE_EVENT_SYSTEM; const char **new_argv = NULL; - int maxactive = 0; char buf[MAX_EVENT_NAME_LEN]; char gbuf[MAX_EVENT_NAME_LEN]; char sbuf[KSYM_NAME_LEN]; @@ -1048,33 +1045,13 @@ static int __trace_fprobe_create(int argc, const char *argv[]) trace_probe_log_init("trace_fprobe", argc, argv); - event = strchr(&argv[0][1], ':'); - if (event) - event++; - - if (isdigit(argv[0][1])) { - if (event) - len = event - &argv[0][1] - 1; - else - len = strlen(&argv[0][1]); - if (len > MAX_EVENT_NAME_LEN - 1) { - trace_probe_log_err(1, BAD_MAXACT); - goto parse_error; - } - memcpy(buf, &argv[0][1], len); - buf[len] = '\0'; - ret = kstrtouint(buf, 0, &maxactive); - if (ret || !maxactive) { + if (argv[0][1] != '\0') { + if (argv[0][1] != ':') { + trace_probe_log_set_index(0); trace_probe_log_err(1, BAD_MAXACT); goto parse_error; } - /* fprobe rethook instances are iterated over via a list. The -* maximum should stay reasonable. -*/ - if (maxactive > RETHOOK_MAXACTIVE_MAX) { - trace_probe_log_err(1, MAXACT_TOO_BIG); - goto parse_error; - } + event = &argv[0][2]; } trace_probe_log_set_index(1); @@ -1084,12 +1061,6 @@ static int __trace_fprobe_create(int argc, const char *argv[]) if (ret < 0) goto parse_error; - if (!is_return && maxactive) { - trace_probe_log_set_index(0); - trace_probe_log_err(1, BAD_MAXACT_TYPE); - goto parse_error; - } - trace_probe_log_set_index(0); if (event) { ret = traceprobe_parse_event_name(&event, &group, gbuf, @@ -1147,8 +1118,7 @@ static int __trace_fprobe_create(int argc, const char *argv[]) goto out; /* setup a probe */ - tf = alloc_trace_fprobe(group, event, symbol, tpoint, maxactive, - argc, is_return); + tf = alloc_trac
[PATCH v13 17/20] selftests: ftrace: Remove obsolate maxactive syntax check
From: Masami Hiramatsu (Google) Since the fprobe event does not support maxactive anymore, stop testing the maxactive syntax error checking. Signed-off-by: Masami Hiramatsu (Google) --- .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc index 61877d166451..c9425a34fae3 100644 --- a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc +++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc @@ -16,9 +16,7 @@ aarch64) REG=%r0 ;; esac -check_error 'f^100 vfs_read' # MAXACT_NO_KPROBE -check_error 'f^1a111 vfs_read' # BAD_MAXACT -check_error 'f^10 vfs_read'# MAXACT_TOO_BIG +check_error 'f^100 vfs_read' # BAD_MAXACT check_error 'f ^non_exist_func'# BAD_PROBE_ADDR (enoent) check_error 'f ^vfs_read+10' # BAD_PROBE_ADDR
[PATCH v13 18/20] selftests/ftrace: Add a test case for repeating register/unregister fprobe
From: Masami Hiramatsu (Google) This test case repeats define and undefine the fprobe dynamic event to ensure that the fprobe does not cause any issue with such operations. Signed-off-by: Masami Hiramatsu (Google) --- .../test.d/dynevent/add_remove_fprobe_repeat.tc| 19 +++ 1 file changed, 19 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc new file mode 100644 index ..b4ad09237e2a --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc @@ -0,0 +1,19 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: Generic dynamic event - Repeating add/remove fprobe events +# requires: dynamic_events "f[:[/][]] [%return] []":README + +echo 0 > events/enable +echo > dynamic_events + +PLACE=$FUNCTION_FORK +REPEAT_TIMES=64 + +for i in `seq 1 $REPEAT_TIMES`; do + echo "f:myevent $PLACE" >> dynamic_events + grep -q myevent dynamic_events + test -d events/fprobes/myevent + echo > dynamic_events +done + +clear_trace
[PATCH v13 19/20] Documentation: probes: Update fprobe on function-graph tracer
From: Masami Hiramatsu (Google) Update fprobe documentation for the new fprobe on function-graph tracer. This includes some bahvior changes and pt_regs to ftrace_regs interface change. Signed-off-by: Masami Hiramatsu (Google) --- Changes in v2: - Update @fregs parameter explanation. --- Documentation/trace/fprobe.rst | 42 ++-- 1 file changed, 27 insertions(+), 15 deletions(-) diff --git a/Documentation/trace/fprobe.rst b/Documentation/trace/fprobe.rst index 196f52386aaa..f58bdc64504f 100644 --- a/Documentation/trace/fprobe.rst +++ b/Documentation/trace/fprobe.rst @@ -9,9 +9,10 @@ Fprobe - Function entry/exit probe Introduction -Fprobe is a function entry/exit probe mechanism based on ftrace. -Instead of using ftrace full feature, if you only want to attach callbacks -on function entry and exit, similar to the kprobes and kretprobes, you can +Fprobe is a function entry/exit probe mechanism based on the function-graph +tracer. +Instead of tracing all functions, if you want to attach callbacks on specific +function entry and exit, similar to the kprobes and kretprobes, you can use fprobe. Compared with kprobes and kretprobes, fprobe gives faster instrumentation for multiple functions with single handler. This document describes how to use fprobe. @@ -91,12 +92,14 @@ The prototype of the entry/exit callback function are as follows: .. code-block:: c - int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct pt_regs *regs, void *entry_data); + int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct ftrace_regs *fregs, void *entry_data); - void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct pt_regs *regs, void *entry_data); + void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct ftrace_regs *fregs, void *entry_data); -Note that the @entry_ip is saved at function entry and passed to exit handler. -If the entry callback function returns !0, the corresponding exit callback will be cancelled. +Note that the @entry_ip is saved at function entry and passed to exit +handler. +If the entry callback function returns !0, the corresponding exit callback +will be cancelled. @fp This is the address of `fprobe` data structure related to this handler. @@ -112,12 +115,10 @@ If the entry callback function returns !0, the corresponding exit callback will This is the return address that the traced function will return to, somewhere in the caller. This can be used at both entry and exit. -@regs -This is the `pt_regs` data structure at the entry and exit. Note that -the instruction pointer of @regs may be different from the @entry_ip -in the entry_handler. If you need traced instruction pointer, you need -to use @entry_ip. On the other hand, in the exit_handler, the instruction -pointer of @regs is set to the current return address. +@fregs +This is the `ftrace_regs` data structure at the entry and exit. This +includes the function parameters, or the return values. So user can +access thos values via appropriate `ftrace_regs_*` APIs. @entry_data This is a local storage to share the data between entry and exit handlers. @@ -125,6 +126,17 @@ If the entry callback function returns !0, the corresponding exit callback will and `entry_data_size` field when registering the fprobe, the storage is allocated and passed to both `entry_handler` and `exit_handler`. +Entry data size and exit handlers on the same function +== + +Since the entry data is passed via per-task stack and it is has limited size, +the entry data size per probe is limited to `15 * sizeof(long)`. You also need +to take care that the different fprobes are probing on the same function, this +limit becomes smaller. The entry data size is aligned to `sizeof(long)` and +each fprobe which has exit handler uses a `sizeof(long)` space on the stack, +you should keep the number of fprobes on the same function as small as +possible. + Share the callbacks with kprobes @@ -165,8 +177,8 @@ This counter counts up when; - fprobe fails to take ftrace_recursion lock. This usually means that a function which is traced by other ftrace users is called from the entry_handler. - - fprobe fails to setup the function exit because of the shortage of rethook - (the shadow stack for hooking the function return.) + - fprobe fails to setup the function exit because of failing to allocate the + data buffer from the per-task shadow stack. The `fprobe::nmissed` field counts up in both cases. Therefore, the former skips both of entry and exit callback and the latter skips the exit
[PATCH v13 20/20] fgraph: Skip recording calltime/rettime if it is not nneeded
From: Masami Hiramatsu (Google) Skip recording calltime and rettime if the fgraph_ops does not need it. This is a kind of performance optimization for fprobe. Since the fprobe user does not use these entries, recording timestamp in fgraph is just a overhead (e.g. eBPF, ftrace). So introduce the skip_timestamp flag, and all fgraph_ops sets this flag, skip recording calltime and rettime. Here is the performance results measured by tools/testing/selftests/bpf/benchs/run_bench_trigger.sh Without this: kprobe-multi :5.700 ± 0.065M/s kretprobe-multi:4.239 ± 0.006M/s With skip-timestamp: kprobe-multi :6.265 ± 0.033M/s+9.91% kretprobe-multi:4.758 ± 0.009M/s+12.24% Suggested-by: Jiri Olsa Signed-off-by: Masami Hiramatsu (Google) --- Changes in v11: - Simplify it to be symmetric on push and pop. (Thus the timestamp getting place is a bit shifted.) Changes in v10: - Add likely() to skipping timestamp. Changes in v9: - Newly added. --- include/linux/ftrace.h |2 ++ kernel/trace/fgraph.c | 36 +--- kernel/trace/fprobe.c |1 + 3 files changed, 36 insertions(+), 3 deletions(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 63fb91088a23..bab6fabb3fa1 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -1160,6 +1160,8 @@ struct fgraph_ops { void*private; trace_func_graph_ent_t saved_func; int idx; + /* If skip_timestamp is true, this does not record timestamps. */ + boolskip_timestamp; }; void *fgraph_reserve_data(int idx, int size_bytes); diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index 6a3e2db16aa4..c116a92839ae 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -174,6 +174,7 @@ int ftrace_graph_active; static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE]; static unsigned long fgraph_array_bitmask; +static bool fgraph_skip_timestamp; /* LRU index table for fgraph_array */ static int fgraph_lru_table[FGRAPH_ARRAY_SIZE]; @@ -557,7 +558,11 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func, return -EBUSY; } - calltime = trace_clock_local(); + /* This is not really 'likely' but for keeping the least path to be faster. */ + if (likely(fgraph_skip_timestamp)) + calltime = 0LL; + else + calltime = trace_clock_local(); offset = READ_ONCE(current->curr_ret_stack); ret_stack = RET_STACK(current, offset); @@ -728,6 +733,12 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret, *ret = ret_stack->ret; trace->func = ret_stack->func; trace->calltime = ret_stack->calltime; + /* This is not really 'likely' but for keeping the least path to be faster. */ + if (likely(!trace->calltime)) + trace->rettime = 0LL; + else + trace->rettime = trace_clock_local(); + trace->overrun = atomic_read(¤t->trace_overrun); trace->depth = current->curr_ret_depth; /* @@ -788,7 +799,6 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe return (unsigned long)panic; } - trace.rettime = trace_clock_local(); if (fregs) ftrace_regs_set_instruction_pointer(fregs, ret); @@ -1248,6 +1258,24 @@ static void ftrace_graph_disable_direct(bool disable_branch) fgraph_direct_gops = &fgraph_stub; } +static void update_fgraph_skip_timestamp(void) +{ + int i; + + for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) { + struct fgraph_ops *gops = fgraph_array[i]; + + if (gops == &fgraph_stub) + continue; + + if (!gops->skip_timestamp) { + fgraph_skip_timestamp = false; + return; + } + } + fgraph_skip_timestamp = true; +} + int register_ftrace_graph(struct fgraph_ops *gops) { int command = 0; @@ -1271,6 +1299,7 @@ int register_ftrace_graph(struct fgraph_ops *gops) gops->idx = i; ftrace_graph_active++; + update_fgraph_skip_timestamp(); if (ftrace_graph_active == 2) ftrace_graph_disable_direct(true); @@ -1303,6 +1332,7 @@ int register_ftrace_graph(struct fgraph_ops *gops) ftrace_graph_active--; gops->saved_func = NULL; fgraph_lru_release_index(i); + update_fgraph_skip_timestamp(); } out: mutex_unlock(&ftrace_lock); @@ -1326,8 +1356,8 @@ void unregister_ftrace_graph(struct fgraph_ops *gops) goto out; fgraph_array[gops->idx] = &fgraph_stub; - ftrace_graph_active--; + update_fgraph_skip_timestamp(); if (!ftrace_graph_active)
[PATCH] usb: typec: fsa4480: Relax CHIP_ID check
Some FSA4480-compatible chips like the OCP96011 used on Fairphone 5 return 0x00 from the CHIP_ID register. Handle that gracefully and only fail probe when the I2C read has failed. With this the dev_dbg will print 0 but otherwise continue working. [0.251581] fsa4480 1-0042: Found FSA4480 v0.0 (Vendor ID = 0) Cc: sta...@vger.kernel.org Fixes: e885f5f1f2b4 ("usb: typec: fsa4480: Check if the chip is really there") Signed-off-by: Luca Weiss --- drivers/usb/typec/mux/fsa4480.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/typec/mux/fsa4480.c b/drivers/usb/typec/mux/fsa4480.c index cd235339834b..f71dba8bf07c 100644 --- a/drivers/usb/typec/mux/fsa4480.c +++ b/drivers/usb/typec/mux/fsa4480.c @@ -274,7 +274,7 @@ static int fsa4480_probe(struct i2c_client *client) return dev_err_probe(dev, PTR_ERR(fsa->regmap), "failed to initialize regmap\n"); ret = regmap_read(fsa->regmap, FSA4480_DEVICE_ID, &val); - if (ret || !val) + if (ret) return dev_err_probe(dev, -ENODEV, "FSA4480 not found\n"); dev_dbg(dev, "Found FSA4480 v%lu.%lu (Vendor ID = %lu)\n", --- base-commit: ccdbf91fdf5a71881ef32b41797382c4edd6f670 change-id: 20240818-fsa4480-chipid-fix-2c7cf5810135 Best regards, -- Luca Weiss
Re: [PATCH] usb: typec: fsa4480: Relax CHIP_ID check
On Sun Aug 18, 2024 at 10:21 PM CEST, Luca Weiss wrote: > Some FSA4480-compatible chips like the OCP96011 used on Fairphone 5 > return 0x00 from the CHIP_ID register. Handle that gracefully and only > fail probe when the I2C read has failed. > > With this the dev_dbg will print 0 but otherwise continue working. > > [0.251581] fsa4480 1-0042: Found FSA4480 v0.0 (Vendor ID = 0) Short appendix: just checked the OCP96011 datasheet and it does mention register 00H being "Device ID" and "Reset Value" being 0x00 so that's expected behavior on this specific chip. Regards Luca > > Cc: sta...@vger.kernel.org > Fixes: e885f5f1f2b4 ("usb: typec: fsa4480: Check if the chip is really there") > Signed-off-by: Luca Weiss > --- > drivers/usb/typec/mux/fsa4480.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/usb/typec/mux/fsa4480.c b/drivers/usb/typec/mux/fsa4480.c > index cd235339834b..f71dba8bf07c 100644 > --- a/drivers/usb/typec/mux/fsa4480.c > +++ b/drivers/usb/typec/mux/fsa4480.c > @@ -274,7 +274,7 @@ static int fsa4480_probe(struct i2c_client *client) > return dev_err_probe(dev, PTR_ERR(fsa->regmap), "failed to > initialize regmap\n"); > > ret = regmap_read(fsa->regmap, FSA4480_DEVICE_ID, &val); > - if (ret || !val) > + if (ret) > return dev_err_probe(dev, -ENODEV, "FSA4480 not found\n"); > > dev_dbg(dev, "Found FSA4480 v%lu.%lu (Vendor ID = %lu)\n", > > --- > base-commit: ccdbf91fdf5a71881ef32b41797382c4edd6f670 > change-id: 20240818-fsa4480-chipid-fix-2c7cf5810135 > > Best regards,
[PATCH v3] remoteproc: xlnx: add sram support
AMD-Xilinx zynqmp platform contains on-chip sram memory (OCM). R5 cores can access OCM and access is faster than DDR memory but slower than TCM memories available. Sram region can have optional multiple power-domains. Platform management firmware is responsible to operate these power-domains. Signed-off-by: Tanmay Shah --- Changes in v2: - Expand commit message with power-domains related information. Changes in v3: - make @sram an array rather than an array of pointers - fix of_node_put usage to maintain proper refcount of node - s/proprty/property - Use gen pool framework for mapping sram address space. drivers/remoteproc/xlnx_r5_remoteproc.c | 155 1 file changed, 155 insertions(+) diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c index 2cea97c746fd..1f704b99a67d 100644 --- a/drivers/remoteproc/xlnx_r5_remoteproc.c +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -56,6 +57,21 @@ struct mem_bank_data { char *bank_name; }; +/** + * struct zynqmp_sram_bank - sram bank description + * + * @sram_pool: gen pool for his sram + * @sram_res: sram address region information + * @va: virtual address of allocated genpool + * @da: device address of sram + */ +struct zynqmp_sram_bank { + struct gen_pool *sram_pool; + struct resource sram_res; + void __iomem *va; + u32 da; +}; + /** * struct mbox_info * @@ -120,6 +136,8 @@ static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = { * struct zynqmp_r5_core * * @rsc_tbl_va: resource table virtual address + * @sram: Array of sram memories assigned to this core + * @num_sram: number of sram for this core * @dev: device of RPU instance * @np: device node of RPU instance * @tcm_bank_count: number TCM banks accessible to this RPU @@ -131,6 +149,8 @@ static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = { */ struct zynqmp_r5_core { void __iomem *rsc_tbl_va; + struct zynqmp_sram_bank *sram; + int num_sram; struct device *dev; struct device_node *np; int tcm_bank_count; @@ -494,6 +514,46 @@ static int add_mem_regions_carveout(struct rproc *rproc) return 0; } +static int add_sram_carveouts(struct rproc *rproc) +{ + struct zynqmp_r5_core *r5_core = rproc->priv; + struct rproc_mem_entry *rproc_mem; + struct zynqmp_sram_bank *sram; + size_t len, pool_size; + dma_addr_t dma_addr; + int da, i; + + for (i = 0; i < r5_core->num_sram; i++) { + sram = &r5_core->sram[i]; + + dma_addr = (dma_addr_t)sram->sram_res.start; + len = resource_size(&sram->sram_res); + da = sram->da; + + pool_size = gen_pool_size(sram[i].sram_pool); + sram->va = (void __iomem *)gen_pool_alloc(sram->sram_pool, pool_size); + if (!sram->va) { + dev_err(r5_core->dev, "failed to alloc sram idx %d pool\n", i); + return -ENOMEM; + } + + /* Register associated reserved memory regions */ + rproc_mem = rproc_mem_entry_init(&rproc->dev, sram->va, +(dma_addr_t)dma_addr, +len, da, +NULL, NULL, +sram->sram_res.name); + + rproc_add_carveout(rproc, rproc_mem); + rproc_coredump_add_segment(rproc, da, len); + + dev_dbg(&rproc->dev, "sram carveout %s addr=%llx, da=0x%x, size=0x%lx", + sram->sram_res.name, dma_addr, da, len); + } + + return 0; +} + /* * tcm_mem_unmap() * @rproc: single R5 core's corresponding rproc instance @@ -669,6 +729,12 @@ static int zynqmp_r5_rproc_prepare(struct rproc *rproc) return ret; } + ret = add_sram_carveouts(rproc); + if (ret) { + dev_err(&rproc->dev, "failed to get sram carveout %d\n", ret); + return ret; + } + return 0; } @@ -695,6 +761,12 @@ static int zynqmp_r5_rproc_unprepare(struct rproc *rproc) "can't turn off TCM bank 0x%x", pm_domain_id); } + for (i = 0; i < r5_core->num_sram; i++) { + gen_pool_free(r5_core->sram[i].sram_pool, + (unsigned long)r5_core->sram[i].va, + gen_pool_size(r5_core->sram[i].sram_pool)); + } + return 0; } @@ -881,6 +953,85 @@ static struct zynqmp_r5_core *zynqmp_r5_add_rproc_core(struct device *cdev) return ERR_PTR(ret); } +static int zynqmp_r5_get_sram_banks(struct zynqmp_r5_core *r5_core) +{ + struct device_node *np = r5_core->np; + struct devi
Re: [PATCH v2 1/2] nvdimm: Fix devs leaks in scan_labels()
(Just back from the summer holiday) Sorry for the late reply. On 10/08/2024 06:38, Dan Williams wrote: > I notice this patch is not upstream yet. Let's try to get it over the > goal line. > > Li Zhijian wrote: >> The leakage would happen when create_namespace_pmem() meets an invalid >> label which gets failure in validating isetcookie. > > I would rewrite this as: > > "scan_labels() leaks memory when label scanning fails and it falls back > to just creating a default "seed" namespace for userspace to configure. > Root can force the kernel to leak memory." It sounds good to me. > > ...then a distribution developer knows the urgency to backport this fix. > >> Try to resuse the devs that may have already been allocated with size >> (2 * sizeof(dev)) previously. > > Rather than conditionally reallocating I think it would be better to > unconditionally allocate the minimum, something like: Okay, I will update it in V3. > > diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c > index d6d558f94d6b..1c38c93bee21 100644 > --- a/drivers/nvdimm/namespace_devs.c > +++ b/drivers/nvdimm/namespace_devs.c > @@ -1937,12 +1937,16 @@ static int cmp_dpa(const void *a, const void *b) > static struct device **scan_labels(struct nd_region *nd_region) > { > int i, count = 0; > - struct device *dev, **devs = NULL; > + struct device *dev, **devs; > struct nd_label_ent *label_ent, *e; > struct nd_mapping *nd_mapping = &nd_region->mapping[0]; > struct nvdimm_drvdata *ndd = to_ndd(nd_mapping); > resource_size_t map_end = nd_mapping->start + nd_mapping->size - 1; > > + devs = kcalloc(2, sizeof(dev), GFP_KERNEL); > + if (!devs) > + return NULL; > + > /* "safe" because create_namespace_pmem() might list_move() > label_ent */ > list_for_each_entry_safe(label_ent, e, &nd_mapping->labels, list) { > struct nd_namespace_label *nd_label = label_ent->label; > @@ -1961,12 +1965,14 @@ static struct device **scan_labels(struct nd_region > *nd_region) > goto err; > if (i < count) > continue; > - __devs = kcalloc(count + 2, sizeof(dev), GFP_KERNEL); > - if (!__devs) > - goto err; > - memcpy(__devs, devs, sizeof(dev) * count); > - kfree(devs); > - devs = __devs; > + if (count) { > + __devs = kcalloc(count + 2, sizeof(dev), GFP_KERNEL); > + if (!__devs) > + goto err; > + memcpy(__devs, devs, sizeof(dev) * count); > + kfree(devs); > + devs = __devs; > + } > > dev = create_namespace_pmem(nd_region, nd_mapping, nd_label); > if (IS_ERR(dev)) { > @@ -1994,10 +2000,6 @@ static struct device **scan_labels(struct nd_region > *nd_region) > /* Publish a zero-sized namespace for userspace to > configure. */ > nd_mapping_free_labels(nd_mapping); > > - devs = kcalloc(2, sizeof(dev), GFP_KERNEL); > - if (!devs) > - goto err; > - > nspm = kzalloc(sizeof(*nspm), GFP_KERNEL); > if (!nspm) > goto err; > @@ -2036,12 +2038,10 @@ static struct device **scan_labels(struct nd_region > *nd_region) > return devs; > >err: > - if (devs) { > - for (i = 0; devs[i]; i++) > - namespace_pmem_release(devs[i]); > - kfree(devs); > - } > - return NULL; > +for (i = 0; devs[i]; i++) > +namespace_pmem_release(devs[i]); > +kfree(devs); > +return NULL; > } > > static struct device **create_namespaces(struct nd_region *nd_region) > > >> A kmemleak reports: >> unreferenced object 0x88800dda1980 (size 16): >>comm "kworker/u10:5", pid 69, jiffies 4294671781 >>hex dump (first 16 bytes): >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>backtrace (crc 0): >> [] __kmalloc+0x32c/0x470 >> [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 >> [libnvdimm] >> [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm] >> [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm] >> [ ] really_probe+0xc6/0x390 >> [<129e2a69>] __driver_probe_device+0x78/0x150 >> [<2dfed28b>] driver_probe_device+0x1e/0x90 >> [ ] __device_attach_driver+0x85/0x110 >> [<32dca295>] bus_for_each_drv+0x85/0xe0 >> [<391c5a7d>] __device_attach+0xbe/0x1e0 >> [<26dabec0>] bus_probe_device+0x94/0xb0 >> [ ] devi
Re: [syzbot] [mm?] possible deadlock in __mmap_lock_do_trace_released
syzbot suspects this issue was fixed by commit: commit 7d6be67cfdd4a53cea7147313ca13c531e3a470f Author: Tetsuo Handa Date: Fri Jun 21 01:08:41 2024 + mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=12d4889398 start commit: a12978712d90 selftests/bpf: Move ARRAY_SIZE to bpf_misc.h git tree: bpf-next kernel config: https://syzkaller.appspot.com/x/.config?x=736daf12bd72e034 dashboard link: https://syzkaller.appspot.com/bug?extid=16b6ab88e66b34d09014 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=125718be98 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1452887698 If the result looks correct, please mark the issue as fixed by replying with: #syz fix: mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer For information about bisection process see: https://goo.gl/tpsmEJ#bisection
Re: [PATCH v2 1/2] nvdimm: Fix devs leaks in scan_labels()
On 10/08/2024 06:55, Dan Williams wrote: > Dan Williams wrote: > [..] >> @@ -2036,12 +2038,10 @@ static struct device **scan_labels(struct nd_region >> *nd_region) > > ...of course you would also need something like: > > if (!count) { > kfree(devs); > return NULL; > } It seems we don't need this cleanup, in the count=0 case, we would reach `err` to free devs. Thanks Zhijian > > ...here, I'll leave that to you to fix up and test. > >> return devs; >> >>err: >> - if (devs) { >> - for (i = 0; devs[i]; i++) >> - namespace_pmem_release(devs[i]); >> - kfree(devs); >> - } >> - return NULL; >> +for (i = 0; devs[i]; i++) >> +namespace_pmem_release(devs[i]); >> +kfree(devs); >> +return NULL; >> } >>
[PATCH v3 1/2] nvdimm: Fix devs leaks in scan_labels()
scan_labels() leaks memory when label scanning fails and it falls back to just creating a default "seed" namespace for userspace to configure. Root can force the kernel to leak memory. Allocate the minimum resources unconditionally and release them when unneeded to avoid the memory leak. A kmemleak reports: unreferenced object 0x88800dda1980 (size 16): comm "kworker/u10:5", pid 69, jiffies 4294671781 hex dump (first 16 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace (crc 0): [] __kmalloc+0x32c/0x470 [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 [libnvdimm] [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm] [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm] [ ] really_probe+0xc6/0x390 [<129e2a69>] __driver_probe_device+0x78/0x150 [<2dfed28b>] driver_probe_device+0x1e/0x90 [ ] __device_attach_driver+0x85/0x110 [<32dca295>] bus_for_each_drv+0x85/0xe0 [<391c5a7d>] __device_attach+0xbe/0x1e0 [<26dabec0>] bus_probe_device+0x94/0xb0 [ ] device_add+0x656/0x870 [<3d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm] [<3f4c52a4>] async_run_entry_fn+0x2e/0x110 [ ] process_one_work+0x1ee/0x600 [<6d90d5a9>] worker_thread+0x183/0x350 Cc: Dave Jiang Cc: Ira Weiny Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation") Suggested-by: Dan Williams Signed-off-by: Li Zhijian Reviewed-by: Dan Williams --- V3: update commit log and allocate the minimum(2 *dev) unconditionally. # Dan V2: update description and comment Signed-off-by: Li Zhijian --- drivers/nvdimm/namespace_devs.c | 34 - 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index d6d558f94d6b..35d9f3cc2efa 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -1937,12 +1937,16 @@ static int cmp_dpa(const void *a, const void *b) static struct device **scan_labels(struct nd_region *nd_region) { int i, count = 0; - struct device *dev, **devs = NULL; + struct device *dev, **devs; struct nd_label_ent *label_ent, *e; struct nd_mapping *nd_mapping = &nd_region->mapping[0]; struct nvdimm_drvdata *ndd = to_ndd(nd_mapping); resource_size_t map_end = nd_mapping->start + nd_mapping->size - 1; + devs = kcalloc(2, sizeof(dev), GFP_KERNEL); + if (!devs) + return NULL; + /* "safe" because create_namespace_pmem() might list_move() label_ent */ list_for_each_entry_safe(label_ent, e, &nd_mapping->labels, list) { struct nd_namespace_label *nd_label = label_ent->label; @@ -1961,12 +1965,14 @@ static struct device **scan_labels(struct nd_region *nd_region) goto err; if (i < count) continue; - __devs = kcalloc(count + 2, sizeof(dev), GFP_KERNEL); - if (!__devs) - goto err; - memcpy(__devs, devs, sizeof(dev) * count); - kfree(devs); - devs = __devs; + if (count) { + __devs = kcalloc(count + 2, sizeof(dev), GFP_KERNEL); + if (!__devs) + goto err; + memcpy(__devs, devs, sizeof(dev) * count); + kfree(devs); + devs = __devs; + } dev = create_namespace_pmem(nd_region, nd_mapping, nd_label); if (IS_ERR(dev)) { @@ -1993,11 +1999,6 @@ static struct device **scan_labels(struct nd_region *nd_region) /* Publish a zero-sized namespace for userspace to configure. */ nd_mapping_free_labels(nd_mapping); - - devs = kcalloc(2, sizeof(dev), GFP_KERNEL); - if (!devs) - goto err; - nspm = kzalloc(sizeof(*nspm), GFP_KERNEL); if (!nspm) goto err; @@ -2036,11 +2037,10 @@ static struct device **scan_labels(struct nd_region *nd_region) return devs; err: - if (devs) { - for (i = 0; devs[i]; i++) - namespace_pmem_release(devs[i]); - kfree(devs); - } + for (i = 0; devs[i]; i++) + namespace_pmem_release(devs[i]); + kfree(devs); + return NULL; } -- 2.29.2
[PATCH v3 2/2] nvdimm: Remove dead code for ENODEV checking in scan_labels()
The only way create_namespace_pmem() returns an ENODEV code is if select_pmem_id(nd_region, &uuid) returns ENODEV when its 2nd parameter is a null pointer. However, this is impossible because &uuid is always valid. Furthermore, create_namespace_pmem() is the only user of select_pmem_id(), it's safe to remove the 'return -ENODEV' branch. Signed-off-by: Li Zhijian --- V2: new patch. It's found when I'm Reviewing/tracing the return values of create_namespace_pmem() --- drivers/nvdimm/namespace_devs.c | 9 - 1 file changed, 9 deletions(-) diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index 35d9f3cc2efa..55cfbf1e0a95 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -1612,9 +1612,6 @@ static int select_pmem_id(struct nd_region *nd_region, const uuid_t *pmem_id) { int i; - if (!pmem_id) - return -ENODEV; - for (i = 0; i < nd_region->ndr_mappings; i++) { struct nd_mapping *nd_mapping = &nd_region->mapping[i]; struct nvdimm_drvdata *ndd = to_ndd(nd_mapping); @@ -1790,9 +1787,6 @@ static struct device *create_namespace_pmem(struct nd_region *nd_region, case -EINVAL: dev_dbg(&nd_region->dev, "invalid label(s)\n"); break; - case -ENODEV: - dev_dbg(&nd_region->dev, "label not found\n"); - break; default: dev_dbg(&nd_region->dev, "unexpected err: %d\n", rc); break; @@ -1980,9 +1974,6 @@ static struct device **scan_labels(struct nd_region *nd_region) case -EAGAIN: /* skip invalid labels */ continue; - case -ENODEV: - /* fallthrough to seed creation */ - break; default: goto err; } -- 2.29.2