On Wed, 2026-05-20 at 17:55 +0800, Kaitao Cheng wrote:
> 在 2026/5/20 06:56, Eduard Zingerman 写道:
> > On Mon, 2026-05-18 at 11:02 +0800, Kaitao Cheng wrote:
> >
> > [...]
> >
> > > > > > The patch does have a bug, however. To fix the issues we are seeing
> > > > > > now,
> > > > > > I propose the additional changes below and would appreciate
> > > > > > feedback.
> > > > > >
> > > > > > --- a/kernel/bpf/helpers.c
> > > > > > +++ b/kernel/bpf/helpers.c
> > > > > > @@ -2263,8 +2263,10 @@ void bpf_list_head_free(const struct
> > > > > > btf_field *field, void *list_head,
> > > > > > if (!head->next || list_empty(head))
> > > > > > goto unlock;
> > > > > > list_for_each_safe(pos, n, head) {
> > > > > > - WRITE_ONCE(container_of(pos,
> > > > > > - struct bpf_list_node_kern,
> > > > > > list_head)->owner, NULL);
> > > > > > + struct bpf_list_node_kern *node;
> > > > > > +
> > > > > > + node = container_of(pos, struct bpf_list_node_kern,
> > > > > > list_head);
> > > > > > + WRITE_ONCE(node->owner, BPF_PTR_POISON);
> > > > > > list_move_tail(pos, &drain);
> > > > > > }
> > > > > > unlock:
> > > > > > @@ -2272,8 +2274,12 @@ void bpf_list_head_free(const struct
> > > > > > btf_field *field, void *list_head,
> > > > > > __bpf_spin_unlock_irqrestore(spin_lock);
> > > > > >
> > > > > > while (!list_empty(&drain)) {
> > > > > > + struct bpf_list_node_kern *node;
> > > > > > +
> > > > > > pos = drain.next;
> > > > > > + node = container_of(pos, struct bpf_list_node_kern,
> > > > > > list_head);
> > > > > > list_del_init(pos);
> > > > > > + WRITE_ONCE(node->owner, NULL);
> >
> > Is CPU allowed to reorder the stores in list_del_init() and WRITE_ONCE()?
> > If it is, I think there is a race here.
>
> Thanks for taking a close look at this. You are right that there is an
> ordering issue here, but I don't think the specific sequence illustrated
> by the example below is problematic.
>
> > Thread #1:
> > enter bpf_list_head_free()
> > acquire H1 lock
> > list_move_tail(pos, &drain); // reordered
> > <-- ip here -->
> > WRITE_ONCE(node->owner, BPF_PTR_POISON); // reordered
> >
> > Thread #2:
> >
> > acquire H1 lock
> > n = bpf_refcount_acquire()
> > release H1 lock
> > acquire H2 lock
> > enter __bpf_list_add()
> > <-- ip here -->
> > cmpxchg(&node->owner, NULL, BPF_PTR_POISON)
>
> Even if the stores from list_move_tail(pos, &drain) become visible before
> WRITE_ONCE(node->owner, BPF_PTR_POISON), node->owner is not NULL in that
> window. Before the WRITE_ONCE(), it still points to H1. After the
> WRITE_ONCE(),
> it is BPF_PTR_POISON. In both cases, __bpf_list_add() will fail:
>
> cmpxchg(&node->owner, NULL, BPF_PTR_POISON)
>
> because the old value is neither NULL nor expected to become NULL from this
> part of bpf_list_head_free().
>
>
> However, I agree that your original concern about the ordering between
> list_del_init() and WRITE_ONCE(node->owner, NULL) is valid for the later
> drain loop:
>
> list_del_init(pos);
> WRITE_ONCE(node->owner, NULL);
>
> Here owner == NULL is the signal that the node can be inserted into another
> list. Since WRITE_ONCE() does not provide release ordering, another CPU may
> observe owner == NULL and successfully acquire the node in __bpf_list_add()
> before the list_del_init() stores are visible. In that case __bpf_list_add()
> can link the node into H2, and the delayed stores from list_del_init() may
> then overwrite the node's list pointers and corrupt the H2 list.
>
> So the fix should be to publish owner == NULL with release ordering after the
> node has been fully unlinked, for example:
>
> ```
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2279,7 +2279,8 @@ void bpf_list_head_free(const struct btf_field *field,
> void *list_head,
> pos = drain.next;
> node = container_of(pos, struct bpf_list_node_kern,
> list_head);
> list_del_init(pos);
> - WRITE_ONCE(node->owner, NULL);
> + /* Ensure __bpf_list_add() sees the node as unlinked. */
> + smp_store_release(&node->owner, NULL);
> /* The contained type can also have resources, including a
> * bpf_list_head which needs to be freed.
> */
> @@ -2607,7 +2608,8 @@ static struct bpf_list_node *__bpf_list_del(struct
> bpf_list_head *head,
> return NULL;
>
> list_del_init(n);
> - WRITE_ONCE(node->owner, NULL);
> + /* Ensure __bpf_list_add() sees the node as unlinked. */
> + smp_store_release(&node->owner, NULL);
> return (struct bpf_list_node *)n;
> }
> ```
>
> The existing cmpxchg() in __bpf_list_add() is a successful RMW with return
> value, so it is fully ordered and is sufficient on the acquire side.
Hi Kaitao,
Thank you for the analysis. I agree with the smp_store_release()
approach, could you please respin the series?