On 30/07/2025 21:26, Leo Yan wrote:
> Hi Adrian,
>
> On Mon, Jul 28, 2025 at 08:02:51PM +0300, Adrian Hunter wrote:
>> On 25/07/2025 12:59, Leo Yan wrote:
>>> This series extends Perf for fine-grained tracing by using BPF program
>>> to pause and resume AUX tracing. The BPF program can be attached to
>>> tracepoints (including ftrace tracepoints and dynamic tracepoints, like
>>> kprobe, kretprobe, uprobe and uretprobe).
>>
>> Using eBPF to pause/resume AUX tracing seems like a great idea.
>>
>> AFAICT with this patch set, there is just support for pause/resume
>> much like what could be done directly without eBPF, so I wonder if you
>> could share a bit more on how you see this evolving, and what your
>> future plans are?
>
> IIUC, here you mean the tool can use `perf probe` to firstly create
> probes, then enable tracepoints as PMU event for AUX pause and resume.
Yes, like:
$ sudo perf probe 'do_sys_openat2 how->flags how->mode'
Added new event:
probe:do_sys_openat2 (on do_sys_openat2 with flags=how->flags mode=how->mode)
You can now use it in all perf tools, such as:
perf record -e probe:do_sys_openat2 -aR sleep 1
$ sudo perf probe do_sys_openat2%return
Added new event:
probe:do_sys_openat2__return (on do_sys_openat2%return)
You can now use it in all perf tools, such as:
perf record -e probe:do_sys_openat2__return -aR sleep 1
$ sudo perf record --kcore -e intel_pt/aux-action=start-paused/k -e
probe:do_sys_openat2/aux-action=resume/ --filter='flags==0x98800' -e
probe:do_sys_openat2__return/aux-action=pause/ -- ls
arch certs CREDITS cscope.out drivers fs include io_uring
Kbuild kernel LICENSES Makefile mm perf.data README
samples security tools virt
block COPYING crypto Documentation init ipc Kconfig lib
MAINTAINERS net rust scripts sound usr
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.067 MB perf.data ]
$ sudo perf script --itrace=qi | grep -B1 instructions ls 37607
[003] 36109.137560: probe:do_sys_openat2: (ffffffff9d2276a0)
flags=0x98800 mode=0x0
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9cdc3834 native_write_msr+0x4 ([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9cdc3836 native_write_msr+0x6 ([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9cd26728 pt_config_start+0x58 ([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9cd27727 pt_event_start+0x107 ([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9d0d5a04 perf_event_aux_pause+0x114 ([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9d0d80f7 __perf_event_overflow+0x197
([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9d0d844d perf_swevent_event+0x12d ([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9d0d8738 perf_tp_event+0x188 ([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9d00fad6 kprobe_perf_func+0x256 ([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9d00fbbd kprobe_dispatcher+0x6d ([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9cf80582 aggr_pre_handler+0x42 ([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9cdbcbb2 kprobe_ftrace_handler+0x152
([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffffc12440f5 ftrace_trampoline+0xf5 ([kernel.kallsyms])
ls 37607 [003] 36109.137562: 1
instructions:k: ffffffff9d2276a5 do_sys_openat2+0x5 ([kernel.kallsyms])
ls 37607 [003] 36109.137563: 1
instructions:k: ffffffff9d4c3d60 hook_file_alloc_security+0x0
([kernel.kallsyms])
ls 37607 [003] 36109.137564: 1
instructions:k: ffffffff9d4a5050 apparmor_file_alloc_security+0x0
([kernel.kallsyms])
ls 37607 [003] 36109.137565: 1
instructions:k: ffffffff9d42d400 cap_capable+0x0 ([kernel.kallsyms])
ls 37607 [003] 36109.137565: 1
instructions:k: ffffffff9d4a4b70 apparmor_capable+0x0 ([kernel.kallsyms])
ls 37607 [003] 36109.137566: 1
instructions:k: ffffffff9d42d400 cap_capable+0x0 ([kernel.kallsyms])
ls 37607 [003] 36109.137566: 1
instructions:k: ffffffff9d4a4b70 apparmor_capable+0x0 ([kernel.kallsyms])
ls 37607 [003] 36109.137567: 1
instructions:k: ffffffff9d4c4e80 hook_file_open+0x0 ([kernel.kallsyms])
ls 37607 [003] 36109.137567: 1
instructions:k: ffffffff9d4a5aa0 apparmor_file_open+0x0 ([kernel.kallsyms])
ls 37607 [003] 36109.137567: 1
instructions:k: ffffffff9d31fb10 ext4_dir_open+0x0 ([kernel.kallsyms])
ls 37607 [003] 36109.137567: 1
instructions:k: ffffffff9d4cc740 ima_file_check+0x0 ([kernel.kallsyms])
ls 37607 [003] 36109.137567: 1
instructions:k: ffffffff9d4a5960 apparmor_current_getlsmprop_subj+0x0
([kernel.kallsyms])
ls 37607 [003] 36109.137568: 1
instructions:k: ffffffff9cdb76c0 arch_rethook_trampoline+0x0
([kernel.kallsyms])
ls 37607 [003] 36109.137568: 1
instructions:k: ffffffff9cf80670 kretprobe_rethook_handler+0x0
([kernel.kallsyms])
ls 37607 [003] 36109.137568: 1
instructions:k: ffffffff9d00fe90 kretprobe_dispatcher+0x0 ([kernel.kallsyms])
ls 37607 [003] 36109.137568: 1
instructions:k: ffffffff9cd282c0 pt_event_stop+0x0 ([kernel.kallsyms])
ls 37607 [003] 36109.137569: 1
instructions:k: ffffffff9cdc3834 native_write_msr+0x4 ([kernel.kallsyms])
>
> I would say a benefit from this series is users can use a single
> command to create probes and bind eBPF program for AUX pause and
> resume in one go.
>
> To be honest, at current stage, I don't have clear idea for expanding
> this feature. But a clear requirement is: AUX trace data usually is
> quite huge, after initial analysis, developers might want to focus
> on specific function profiling (based on function entry and exit) or
> specific period (E.g., start tracing when hit a tracepoing and stop when
> hit another tracepoint).
>
> eBPF program is powerful. Basically, we can extend it in two different
> dimensions. One direction is we can easily attach the eBPF program to more
> kernel modules, like networking, storage, etc. Another direction is to
> improve the eBPF program itself as a filter for better fine-grained
> tracing, so far we only support limited filtering based on CPU ID or PID,
> we also can extend the filtering based on time, event types, etc.