On Mon, Nov 26, 2018 at 11:26:03AM -0500, Steven Rostedt wrote: > On Tue, 27 Nov 2018 01:07:55 +0900 > Masami Hiramatsu <mhira...@kernel.org> wrote: > > > > > --- a/include/linux/sched.h > > > > +++ b/include/linux/sched.h > > > > @@ -1119,7 +1119,7 @@ struct task_struct { > > > > int curr_ret_depth; > > > > > > > > /* Stack of return addresses for return function tracing: */ > > > > - struct ftrace_ret_stack *ret_stack; > > > > + unsigned long *ret_stack; > > > > > > > > /* Timestamp for last schedule: */ > > > > unsigned long long ftrace_timestamp; > > > > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c > > > > index 9b85638ecded..1389fe39f64c 100644 > > > > --- a/kernel/trace/fgraph.c > > > > +++ b/kernel/trace/fgraph.c > > > > @@ -23,6 +23,17 @@ > > > > #define ASSIGN_OPS_HASH(opsname, val) > > > > #endif > > > > > > > > +#define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack)) > > > > +#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / > > > > sizeof(long)) > > > > +#define SHADOW_STACK_SIZE (FTRACE_RETFUNC_DEPTH * FGRAPH_RET_SIZE) > > > > +#define SHADOW_STACK_INDEX \ > > > > + (ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long)) > > > > +#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX) > > > > + > > > > +#define RET_STACK(t, index) ((struct ftrace_ret_stack > > > > *)(&(t)->ret_stack[index])) > > > > +#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; }) > > > > +#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; }) > > > > + > > > [...] > > > > @@ -514,7 +531,7 @@ void ftrace_graph_init_task(struct task_struct *t) > > > > > > > > void ftrace_graph_exit_task(struct task_struct *t) > > > > { > > > > - struct ftrace_ret_stack *ret_stack = t->ret_stack; > > > > + unsigned long *ret_stack = t->ret_stack; > > > > > > > > t->ret_stack = NULL; > > > > /* NULL must become visible to IRQs before we free it: */ > > > > @@ -526,12 +543,10 @@ void ftrace_graph_exit_task(struct task_struct *t) > > > > /* Allocate a return stack for each task */ > > > > static int start_graph_tracing(void) > > > > { > > > > - struct ftrace_ret_stack **ret_stack_list; > > > > + unsigned long **ret_stack_list; > > > > int ret, cpu; > > > > > > > > - ret_stack_list = kmalloc_array(FTRACE_RETSTACK_ALLOC_SIZE, > > > > - sizeof(struct ftrace_ret_stack > > > > *), > > > > - GFP_KERNEL); > > > > + ret_stack_list = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL); > > > > > > > > > > I had dumped the fgraph size related macros to understand the patch > > > better, I > > > got: > > > [ 0.909528] val of FGRAPH_RET_SIZE is 40 > > > [ 0.910250] val of FGRAPH_RET_INDEX is 5 > > > [ 0.910866] val of FGRAPH_ARRAY_SIZE is 16 > > > [ 0.911488] val of FGRAPH_ARRAY_MASK is 255 > > > [ 0.912134] val of FGRAPH_MAX_INDEX is 16 > > > [ 0.912751] val of FGRAPH_INDEX_SHIFT is 8 > > > [ 0.913382] val of FGRAPH_FRAME_SIZE is 168 > > > [ 0.914033] val of FGRAPH_FRAME_INDEX is 21 > > > FTRACE_RETFUNC_DEPTH is 50 > > > [ 0.914686] val of SHADOW_STACK_SIZE is 8400 > > > > > > I had a concern about memory overhead per-task. It seems the total memory > > > needed per task for the stack is 8400 bytes (with my configuration with > > > FUNCTION_PROFILE > > > turned off). > > > > > > Where as before it would be 32 * 40 = 1280 bytes. That looks like ~7 times > > > more than before. > > > > Hmm, this seems too big... I thought the shadow-stack size should be > > smaller than 1 page (4kB). Steve, can we give a 4k page for shadow stack > > and define FTRACE_RETFUNC_DEPTH = 4096 / FGRAPH_RET_SIZE ? > > For the first pass, I decided not to worry about the size. It made the > code less complex :-) > > Yes, I plan on working on making the size of the stack smaller, but > that will probably be added on patches to do so.
Cool, sounds good. > > > On my system with ~4000 threads, that becomes ~32MB which seems a bit > > > wasteful especially if there was only one or 2 function graph callbacks > > > registered and most of the callback array in the stack isn't used. > > Note, all 4000 threads could be doing those trace backs, and if you are > doing full function graph tracing, it will use a lot. But I think each of the threads will only use N entries in the callback array where N is the number of function graph callback users who registered, right? So the remaining total-N allocated callback array entries per thread will not be used. > > > Could we make the array size configurable at compile time and start it > > > with a > > > small number like 4 or 6? > > > > Or, we can introduce online setting :) > > Yes, that can easily be added. I didn't try to make this into the > perfect solution, I wanted a solid one first, and then massage it into > something that is more efficient, both with memory consumption and > performance. > > Joel and Masami, thanks for the feedback. I agree the first step is good so far. Looking forward to the patches, thanks a lot, - Joel