Em Tue, Apr 26, 2016 at 10:26:32AM -0700, Alexei Starovoitov escreveu: > On Tue, Apr 26, 2016 at 06:38:28PM +0200, Peter Zijlstra wrote: > > On Mon, Apr 25, 2016 at 10:24:31PM -0300, Arnaldo Carvalho de Melo wrote: > > > Em Mon, Apr 25, 2016 at 10:03:58PM -0300, Arnaldo Carvalho de Melo > > > escreveu: > > > > I now need to continue investigation why this doesn't seem to work from > > > > tracepoints...
> > > Bummer, the changeset (at the end of this message) hasn't any > > > explanation, is this really impossible? I.e. LBR callstacks from > > > tracepoints? Even if we set perf_event_attr.exclude_callchain_kernel? > > Could maybe be done, but its tricky to implement as the LBR is managed > > by the hardware PMU and tracepoints are a software PMU, so we need to > > then somehow frob with cross-pmu resources, in a very arch specific way. > > And programmability of the hardware PMU will then depend on events > > outside of it. > btw we're thinking to add support for lbr to bpf, so that from the program > we can get accurate and fast stacks. That's especially important for user > space stacks. No clear idea how to do it yet, but it would be really useful. Yeah, and that already works in perf, its just that it doesn't work from some points (PERF_TYPE_SOFTWARE, PERF_TYPE_TRACEPOINT, etc), as described in the changeset I mentioned. 'perf trace --call-graph lbr' doesn't work right now even with it interested only in the user space bits, i.e. setting perf_event_attr.exclude_callchain_kernel. # perf trace --call-graph dwarf works, but that, as you mention, really isn't "fast" and sometimes not accurate, or at least wasn't with broken toolchains. Example of mixed strace-like with userspace-only DWARF callchains (would be lovely if this was with LBR, huh?) plus fp callchains for the sched:sched_switch tracepoint plus LBR callchains for a hardware event, cycles, look further below for the reason of the broken timestamps for PERF_TYPE_HARDWARE events: # perf trace -e nanosleep --event sched:sched_switch/call-graph=fp/ --ev cycles/call-graph=lbr,period=100/ usleep 1 18446744073709.551 ( ): cycles/call-graph=lbr,period=100/:) __intel_pmu_enable_all+0xfe200080 ([kernel.kallsyms]) intel_pmu_enable_all+0xfe200010 ([kernel.kallsyms]) x86_pmu_enable+0xfe200271 ([kernel.kallsyms]) perf_pmu_enable.part.81+0xfe200007 ([kernel.kallsyms]) ctx_resched+0xfe20007a ([kernel.kallsyms]) perf_event_exec+0xfe20011d ([kernel.kallsyms]) setup_new_exec+0xfe20006f ([kernel.kallsyms]) load_elf_binary+0xfe2003e3 ([kernel.kallsyms]) search_binary_handler+0xfe20009e ([kernel.kallsyms]) do_execveat_common.isra.38+0xfe20052c ([kernel.kallsyms]) sys_execve+0xfe20003a ([kernel.kallsyms]) do_syscall_64+0xfe200062 ([kernel.kallsyms]) return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms]) [0] ([unknown]) 0.310 ( 0.006 ms): usleep/20951 nanosleep(rqtp: 0x7ffda8904500 ) ... 0.310 ( ): sched:sched_switch:usleep:20951 [120] S ==> swapper/3:0 [120]) __schedule+0xfe200402 ([kernel.kallsyms]) schedule+0xfe200035 ([kernel.kallsyms]) do_nanosleep+0xfe20006f ([kernel.kallsyms]) hrtimer_nanosleep+0xfe2000dc ([kernel.kallsyms]) sys_nanosleep+0xfe20007a ([kernel.kallsyms]) do_syscall_64+0xfe200062 ([kernel.kallsyms]) return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms]) __nanosleep+0xffff00bfad62c010 (/usr/lib64/libc-2.22.so) 18446679523046.461 ( ): cycles/call-graph=lbr,period=100/:) perf_pmu_enable.part.81+0xfe200007 ([kernel.kallsyms]) __perf_event_task_sched_in+0xfe2001ad ([kernel.kallsyms]) finish_task_switch+0xfe200156 ([kernel.kallsyms]) __schedule+0xfe200397 ([kernel.kallsyms]) schedule+0xfe200035 ([kernel.kallsyms]) do_nanosleep+0xfe20006f ([kernel.kallsyms]) hrtimer_nanosleep+0xfe2000dc ([kernel.kallsyms]) sys_nanosleep+0xfe20007a ([kernel.kallsyms]) do_syscall_64+0xfe200062 ([kernel.kallsyms]) return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms]) [0] ([unknown]) 0.377 ( 0.073 ms): usleep/20951 ... [continued]: nanosleep()) = 0 [root@jouet ~]# perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0 (PERF_COUNT_HW_CPU_CYCLES) size 112 { sample_period, sample_freq } 100 sample_type IP|TID|CALLCHAIN|BRANCH_STACK|IDENTIFIER read_format ID disabled 1 inherit 1 enable_on_exec 1 sample_id_all 1 exclude_guest 1 { wakeup_events, wakeup_watermark } 1 branch_sample_type USER|CALL_STACK|NO_FLAGS|NO_CYCLES missing PERF_SAMPLE_TIME, will fix. - Arnaldo