On Wed, 2026-05-20 at 17:55 +0800, Kaitao Cheng wrote:
> 在 2026/5/20 06:56, Eduard Zingerman 写道:
> > On Mon, 2026-05-18 at 11:02 +0800, Kaitao Cheng wrote:
> > 
> > [...]
> > 
> > > > > > The patch does have a bug, however. To fix the issues we are seeing 
> > > > > > now,
> > > > > > I propose the additional changes below and would appreciate 
> > > > > > feedback.
> > > > > > 
> > > > > > --- a/kernel/bpf/helpers.c
> > > > > > +++ b/kernel/bpf/helpers.c
> > > > > > @@ -2263,8 +2263,10 @@ void bpf_list_head_free(const struct 
> > > > > > btf_field *field, void *list_head,
> > > > > >         if (!head->next || list_empty(head))
> > > > > >                 goto unlock;
> > > > > >         list_for_each_safe(pos, n, head) {
> > > > > > -               WRITE_ONCE(container_of(pos,
> > > > > > -                       struct bpf_list_node_kern, 
> > > > > > list_head)->owner, NULL);
> > > > > > +               struct bpf_list_node_kern *node;
> > > > > > +
> > > > > > +               node = container_of(pos, struct bpf_list_node_kern, 
> > > > > > list_head);
> > > > > > +               WRITE_ONCE(node->owner, BPF_PTR_POISON);
> > > > > >                 list_move_tail(pos, &drain);
> > > > > >         }
> > > > > >  unlock:
> > > > > > @@ -2272,8 +2274,12 @@ void bpf_list_head_free(const struct 
> > > > > > btf_field *field, void *list_head,
> > > > > >         __bpf_spin_unlock_irqrestore(spin_lock);
> > > > > > 
> > > > > >         while (!list_empty(&drain)) {
> > > > > > +               struct bpf_list_node_kern *node;
> > > > > > +
> > > > > >                 pos = drain.next;
> > > > > > +               node = container_of(pos, struct bpf_list_node_kern, 
> > > > > > list_head);
> > > > > >                 list_del_init(pos);
> > > > > > +               WRITE_ONCE(node->owner, NULL);
> > 
> > Is CPU allowed to reorder the stores in list_del_init() and WRITE_ONCE()?
> > If it is, I think there is a race here.
> 
> Thanks for taking a close look at this. You are right that there is an
> ordering issue here, but I don't think the specific sequence illustrated
> by the example below is problematic.
> 
> > Thread #1:
> >   enter bpf_list_head_free()
> >   acquire H1 lock
> >   list_move_tail(pos, &drain);             // reordered
> >   <-- ip here -->
> >   WRITE_ONCE(node->owner, BPF_PTR_POISON); // reordered
> > 
> > Thread #2:
> > 
> >   acquire H1 lock
> >   n = bpf_refcount_acquire()
> >   release H1 lock
> >   acquire H2 lock
> >   enter __bpf_list_add()
> >   <-- ip here -->
> >   cmpxchg(&node->owner, NULL, BPF_PTR_POISON)
> 
> Even if the stores from list_move_tail(pos, &drain) become visible before
> WRITE_ONCE(node->owner, BPF_PTR_POISON), node->owner is not NULL in that
> window. Before the WRITE_ONCE(), it still points to H1. After the 
> WRITE_ONCE(),
> it is BPF_PTR_POISON. In both cases, __bpf_list_add() will fail:
> 
>       cmpxchg(&node->owner, NULL, BPF_PTR_POISON)
> 
> because the old value is neither NULL nor expected to become NULL from this
> part of bpf_list_head_free().
> 
> 
> However, I agree that your original concern about the ordering between
> list_del_init() and WRITE_ONCE(node->owner, NULL) is valid for the later
> drain loop:
> 
>       list_del_init(pos);
>       WRITE_ONCE(node->owner, NULL);
> 
> Here owner == NULL is the signal that the node can be inserted into another
> list. Since WRITE_ONCE() does not provide release ordering, another CPU may
> observe owner == NULL and successfully acquire the node in __bpf_list_add()
> before the list_del_init() stores are visible. In that case __bpf_list_add()
> can link the node into H2, and the delayed stores from list_del_init() may
> then overwrite the node's list pointers and corrupt the H2 list.
> 
> So the fix should be to publish owner == NULL with release ordering after the
> node has been fully unlinked, for example:
> 
> ```
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2279,7 +2279,8 @@ void bpf_list_head_free(const struct btf_field *field, 
> void *list_head,
>                 pos = drain.next;
>                 node = container_of(pos, struct bpf_list_node_kern, 
> list_head);
>                 list_del_init(pos);
> -               WRITE_ONCE(node->owner, NULL);
> +               /* Ensure __bpf_list_add() sees the node as unlinked. */
> +               smp_store_release(&node->owner, NULL);
>                 /* The contained type can also have resources, including a
>                  * bpf_list_head which needs to be freed.
>                  */
> @@ -2607,7 +2608,8 @@ static struct bpf_list_node *__bpf_list_del(struct 
> bpf_list_head *head,
>                 return NULL;
> 
>         list_del_init(n);
> -       WRITE_ONCE(node->owner, NULL);
> +       /* Ensure __bpf_list_add() sees the node as unlinked. */
> +       smp_store_release(&node->owner, NULL);
>         return (struct bpf_list_node *)n;
>  }
> ```
> 
> The existing cmpxchg() in __bpf_list_add() is a successful RMW with return
> value, so it is fully ordered and is sufficient on the acquire side.

Hi Kaitao,

Thank you for the analysis. I agree with the smp_store_release()
approach, could you please respin the series?

Reply via email to