在 2026/5/20 06:56, Eduard Zingerman 写道:
> On Mon, 2026-05-18 at 11:02 +0800, Kaitao Cheng wrote:
> 
> [...]
> 
>>>>> The patch does have a bug, however. To fix the issues we are seeing now,
>>>>> I propose the additional changes below and would appreciate feedback.
>>>>>
>>>>> --- a/kernel/bpf/helpers.c
>>>>> +++ b/kernel/bpf/helpers.c
>>>>> @@ -2263,8 +2263,10 @@ void bpf_list_head_free(const struct btf_field 
>>>>> *field, void *list_head,
>>>>>         if (!head->next || list_empty(head))
>>>>>                 goto unlock;
>>>>>         list_for_each_safe(pos, n, head) {
>>>>> -               WRITE_ONCE(container_of(pos,
>>>>> -                       struct bpf_list_node_kern, list_head)->owner, 
>>>>> NULL);
>>>>> +               struct bpf_list_node_kern *node;
>>>>> +
>>>>> +               node = container_of(pos, struct bpf_list_node_kern, 
>>>>> list_head);
>>>>> +               WRITE_ONCE(node->owner, BPF_PTR_POISON);
>>>>>                 list_move_tail(pos, &drain);
>>>>>         }
>>>>>  unlock:
>>>>> @@ -2272,8 +2274,12 @@ void bpf_list_head_free(const struct btf_field 
>>>>> *field, void *list_head,
>>>>>         __bpf_spin_unlock_irqrestore(spin_lock);
>>>>>
>>>>>         while (!list_empty(&drain)) {
>>>>> +               struct bpf_list_node_kern *node;
>>>>> +
>>>>>                 pos = drain.next;
>>>>> +               node = container_of(pos, struct bpf_list_node_kern, 
>>>>> list_head);
>>>>>                 list_del_init(pos);
>>>>> +               WRITE_ONCE(node->owner, NULL);
> 
> Is CPU allowed to reorder the stores in list_del_init() and WRITE_ONCE()?
> If it is, I think there is a race here.

Thanks for taking a close look at this. You are right that there is an
ordering issue here, but I don't think the specific sequence illustrated
by the example below is problematic.

> Thread #1:
>   enter bpf_list_head_free()
>   acquire H1 lock
>   list_move_tail(pos, &drain);             // reordered
>   <-- ip here -->
>   WRITE_ONCE(node->owner, BPF_PTR_POISON); // reordered
> 
> Thread #2:
> 
>   acquire H1 lock
>   n = bpf_refcount_acquire()
>   release H1 lock
>   acquire H2 lock
>   enter __bpf_list_add()
>   <-- ip here -->
>   cmpxchg(&node->owner, NULL, BPF_PTR_POISON)

Even if the stores from list_move_tail(pos, &drain) become visible before
WRITE_ONCE(node->owner, BPF_PTR_POISON), node->owner is not NULL in that
window. Before the WRITE_ONCE(), it still points to H1. After the WRITE_ONCE(),
it is BPF_PTR_POISON. In both cases, __bpf_list_add() will fail:

        cmpxchg(&node->owner, NULL, BPF_PTR_POISON)

because the old value is neither NULL nor expected to become NULL from this
part of bpf_list_head_free().


However, I agree that your original concern about the ordering between
list_del_init() and WRITE_ONCE(node->owner, NULL) is valid for the later
drain loop:

        list_del_init(pos);
        WRITE_ONCE(node->owner, NULL);

Here owner == NULL is the signal that the node can be inserted into another
list. Since WRITE_ONCE() does not provide release ordering, another CPU may
observe owner == NULL and successfully acquire the node in __bpf_list_add()
before the list_del_init() stores are visible. In that case __bpf_list_add()
can link the node into H2, and the delayed stores from list_del_init() may
then overwrite the node's list pointers and corrupt the H2 list.

So the fix should be to publish owner == NULL with release ordering after the
node has been fully unlinked, for example:

```
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2279,7 +2279,8 @@ void bpf_list_head_free(const struct btf_field *field, 
void *list_head,
                pos = drain.next;
                node = container_of(pos, struct bpf_list_node_kern, list_head);
                list_del_init(pos);
-               WRITE_ONCE(node->owner, NULL);
+               /* Ensure __bpf_list_add() sees the node as unlinked. */
+               smp_store_release(&node->owner, NULL);
                /* The contained type can also have resources, including a
                 * bpf_list_head which needs to be freed.
                 */
@@ -2607,7 +2608,8 @@ static struct bpf_list_node *__bpf_list_del(struct 
bpf_list_head *head,
                return NULL;

        list_del_init(n);
-       WRITE_ONCE(node->owner, NULL);
+       /* Ensure __bpf_list_add() sees the node as unlinked. */
+       smp_store_release(&node->owner, NULL);
        return (struct bpf_list_node *)n;
 }
```

The existing cmpxchg() in __bpf_list_add() is a successful RMW with return
value, so it is fully ordered and is sufficient on the acquire side.

-- 
Thanks
Kaitao Cheng


Reply via email to