On 6/30/26 16:48, Gabriele Monaco wrote:
On Mon, 2026-06-29 at 00:47 +0800, Wen Yang wrote:
On [2-1]~[2-5]: embedded consumer causes UAF on PREEMPT_RT
-----------------------------------------------------------

The uprobe_bind selftest oopses on PREEMPT_RT(full):

    handler_chain+0xc9: mov rax, [r15+0x18]  ; advance list iterator
    RAX: 000015ec00001f28   ; garbage — &uprobe->consumers after kfree

handler_chain() reads uc->cons_node.next after uc->handler() returns,
still inside rcu_read_lock_trace().  That pointer is &uprobe->consumers
(the list head embedded in struct uprobe), which gets freed through:

    put_uprobe()
      -> schedule_work(uprobe_free_deferred)   /* async */
         -> call_srcu(&uretprobes_srcu, ...)
            -> call_rcu_tasks_trace(kfree_uprobe)
               -> kfree(uprobe)

rv_uprobe_sync() calls uprobe_unregister_sync() which calls
synchronize_srcu(&uretprobes_srcu), but that only matters after the
kworker has submitted work to uretprobes_srcu.  On a loaded PREEMPT_RT
box the kworker may not have run yet, so synchronize_srcu() returns
immediately and kfree(uprobe) races with the still-iterating
handler_chain():

    CPU A                                CPU B
    consumer_del() → list_del_rcu        rcu_read_lock_trace()
    put_uprobe()   → schedule_work         uc->handler() returns
    rv_uprobe_sync():                       reading cons_node.next...
      synchronize_srcu(&uretprobes_srcu)

If CPU B is in an RCU trace read-side critical section this doesn't
return immediately, it returns after readers are done, doesn't it?
(well, what you really care here is the synchronize_rcu_tasks_trace()
but we do both).

      ← idle; returns immediately
    [kworker fires later]
      kfree(uprobe)  ← frees &uprobe->consumers
                                          cons_node.next = freed mem → CRASH


So I had a bit of a look and as far as I understand, we don't have any
control over the uprobe allocation pattern (workqueues and whatnot) and
we don't really care as long as we deregister it appropriately.

What we do control is the uprobe_consumer, that must be freed only after
the uprobe was synchronously deregistered, that should guarantee no
reader is going to reference it, shouldn't it?

uprobe_unregister_nosync() removes it from the cons_node list, so it's
safe to assume any handler_chain() after the next RCU-trace grace period
won't see it.

In the sketch I sent this is happening (all kfree(b) are after
rv_uprobe_unregister() which does the sync). What am I missing here?


uprobe is also freed after both grace periods, so no reader should use
that either.

The situation you're seeing isn't fully clear to me, I applied my sketch
and don't see any splat on a vng box.

With uc embedded in the binding (as [2-1] suggests), no amount of
delaying kfree(binding) helps: uprobe->consumers is freed by a chain we
don't control.  The fix is to keep uc->cons_node in memory that outlives
the handler_chain() iteration, which means a separate allocation freed
only after rv_uprobe_sync():

    rv_uprobe_unregister_nosync()  /* list_del_rcu + schedule_work */
    rv_uprobe_sync()               /* waits for any already-submitted
srcu work */

We of course need to do any free after this sync, but I don't get why we
need an additional allocation since the following are both plain
synchronous frees, why isn't a single one (on b) enough?


Thank you for the patient explanation.
You are right, and v4 implements the embedded approach.
struct rv_uprobe directly embeds struct uprobe_consumer:

  struct rv_uprobe {
      struct uprobe_consumer  uc;
      struct uprobe          *uprobe;
      struct inode           *inode;
      void                   *priv;
      int (*handler)(...);
      int (*ret_handler)(...);
  };

rv_uprobe_free() is gone — no allocation means no explicit free.  After
rv_uprobe_unregister() (or rv_uprobe_unregister_nosync() + rv_uprobe_sync()), the caller frees the containing struct directly. In tlob:

  rv_uprobe_unregister_nosync(&b->start_probe);
  rv_uprobe_unregister_nosync(&b->stop_probe);
  rv_uprobe_sync();
  kfree(b);   /* frees both embedded consumers */

This is the pattern from your sketch.

Regarding my earlier reply that argued for the separate allocation: I was wrong. The key barrier is synchronize_rcu_tasks_trace() (called first in uprobe_unregister_sync()), which waits for all rcu_read_lock_trace() readers including handler_chain(). After it returns, no cons_node.next read is in flight and embedding is safe.

We appreciate your thorough review. All of your comments have been addressed in v4. We'll run local tests for one or two days, and then it will be sent out shortly.

--
Best wishes,
Wen



Reply via email to