On Wed, 3 Jun 2026 18:26:24 +0200
"David Hildenbrand (Arm)" <[email protected]> wrote:

> Yeah, I was fearing that when I read in [2]:
> 
>       "It has become clear in the past that this promise extends to
>        tracepoints, most notably in 2011 when a tracepoint change broke
>        powertop and had to be reverted."

Technically the issue is with trace events and not tracepoints. The
difference is that a trace event is created via the TRACE_EVENT() macro
which defines what is to be collected from the tracepoint and exposes that
information to tracefs which applications can easily see.

A tracepoint is simply the hook in the code that you can attach to. Trace
events create a callback from that hook to extract the data from the
tracepoint to fill in the fields.

> 
> Which means that I now also fully understand
> 
>       "Some kernel maintainers prohibit or severely restrict the addition of
>        tracepoints to their subsystems out of fear that a similar thing could
>        happen to them. "
> 
> Whatever the result of this discussion will be, I'll try to document it.

You can still create a tracepoint without creating a trace event by using
the DECLARE_TRACE() macro. The scheduler subsystem uses that quite
extensively. That creates a tracepoint without exposing it to tracefs. The
runtime verifier uses these hooks to monitor the scheduler.

But you can still connect to these tracepoints from tracefs via a tprobe. A
tprobe hooks to tracepoints that you need the source code to find (just
like a fprobe hooks to any function). Thus applications *can't* rely on
them because there's nothing there to tell you it exists or not.

For example, for the given tracepoint:

 # cd /sys/kernel/tracing
 # echo 't:rfail memory_failure_event pfn=pfn type=type result=result' > 
dynamic_events
 # cat events/tracepoints/rfail/format 
name: rfail
ID: 1894
format:
        field:unsigned short common_type;       offset:0;       size:2; 
signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; 
signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; 
signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:unsigned long __probe_ip; offset:8;       size:8; signed:0;
        field:u64 pfn;  offset:16;      size:8; signed:0;
        field:s32 type; offset:24;      size:4; signed:1;
        field:s32 result;       offset:28;      size:4; signed:1;

print fmt: "(%lx) pfn=%Lu type=%d result=%d", REC->__probe_ip, REC->pfn, 
REC->type, REC->result

It requires that BTF exists and the above doesn't annotate the result as
nicely. But you can get data directly from tracepoints this way.

-- Steve

Reply via email to