On Wed, 4 Jan 2017 11:01:02 +0100 Peter Zijlstra <pet...@infradead.org> wrote:
> On Wed, Jan 04, 2017 at 02:06:04PM +0900, Masami Hiramatsu wrote: > > On Tue, 3 Jan 2017 11:54:02 +0100 > > Peter Zijlstra <pet...@infradead.org> wrote: > > > > How many entries should one expect on that list? I spend quite a bit of > > > time reducing the cost of is_module_text_address() a while back and see > > > that both ftrace (which actually needs this to be fast) and now > > > kprobes have linear list walks in here. > > > > It depends on how many probes are used and optimized. However, in most > > cases, there should be one entry (unless user defines optimized probes > > over 32 on x86, from my experience, it is very rare case. :) ) > > OK, that's good :-) OK, I'll add above comment on the patch. > > > > I'm assuming the ftrace thing to be mostly empty, since I never saw it > > > on my benchmarks back then, but it is something Steve should look at I > > > suppose. > > > > > > Similarly, the changelog here should include some talk about worst case > > > costs. > > > > Would you have any good benchmark to measure it? > > Not trivially so; what I did was cobble together a debugfs file that > measures the average of the PMI time in perf_sample_event_took(), and a > module that has a 10 deep callchain around a while(1) loop. Then perf > record with callchains for a few seconds. > > Generating the callchain does the unwinder thing and ends up calling > is_kernel_address() lots. > > The case I worked on was 0 modules vs 100+ modules in a distro build, > which was fairly obviously painful back then, since > is_module_text_address() used a linear lookup. > > I'm not sure I still have all those bits, but I can dig around a bit if > you're interested. Hmm, I tried to do similar thing (make a test module which has a loop with 10 deep recursive call and save stack-trace) on kvm, but got very unstable results. Maybe it needs to run on bare-metal machine. Thanks, -- Masami Hiramatsu <mhira...@kernel.org>