On Wed, 3 Jun 2026 18:26:24 +0200
"David Hildenbrand (Arm)" <[email protected]> wrote:
> Yeah, I was fearing that when I read in [2]:
>
> "It has become clear in the past that this promise extends to
> tracepoints, most notably in 2011 when a tracepoint change broke
> powertop and had to be reverted."
Technically the issue is with trace events and not tracepoints. The
difference is that a trace event is created via the TRACE_EVENT() macro
which defines what is to be collected from the tracepoint and exposes that
information to tracefs which applications can easily see.
A tracepoint is simply the hook in the code that you can attach to. Trace
events create a callback from that hook to extract the data from the
tracepoint to fill in the fields.
>
> Which means that I now also fully understand
>
> "Some kernel maintainers prohibit or severely restrict the addition of
> tracepoints to their subsystems out of fear that a similar thing could
> happen to them. "
>
> Whatever the result of this discussion will be, I'll try to document it.
You can still create a tracepoint without creating a trace event by using
the DECLARE_TRACE() macro. The scheduler subsystem uses that quite
extensively. That creates a tracepoint without exposing it to tracefs. The
runtime verifier uses these hooks to monitor the scheduler.
But you can still connect to these tracepoints from tracefs via a tprobe. A
tprobe hooks to tracepoints that you need the source code to find (just
like a fprobe hooks to any function). Thus applications *can't* rely on
them because there's nothing there to tell you it exists or not.
For example, for the given tracepoint:
# cd /sys/kernel/tracing
# echo 't:rfail memory_failure_event pfn=pfn type=type result=result' >
dynamic_events
# cat events/tracepoints/rfail/format
name: rfail
ID: 1894
format:
field:unsigned short common_type; offset:0; size:2;
signed:0;
field:unsigned char common_flags; offset:2; size:1;
signed:0;
field:unsigned char common_preempt_count; offset:3; size:1;
signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned long __probe_ip; offset:8; size:8; signed:0;
field:u64 pfn; offset:16; size:8; signed:0;
field:s32 type; offset:24; size:4; signed:1;
field:s32 result; offset:28; size:4; signed:1;
print fmt: "(%lx) pfn=%Lu type=%d result=%d", REC->__probe_ip, REC->pfn,
REC->type, REC->result
It requires that BTF exists and the above doesn't annotate the result as
nicely. But you can get data directly from tracepoints this way.
-- Steve