在 2026/5/20 06:56, Eduard Zingerman 写道:
> On Mon, 2026-05-18 at 11:02 +0800, Kaitao Cheng wrote:
>
> [...]
>
>>>>> The patch does have a bug, however. To fix the issues we are seeing now,
>>>>> I propose the additional changes below and would appreciate feedback.
>>>>>
>>>>> --- a/kernel/bpf/helpers.c
>>>>> +++ b/kernel/bpf/helpers.c
>>>>> @@ -2263,8 +2263,10 @@ void bpf_list_head_free(const struct btf_field
>>>>> *field, void *list_head,
>>>>> if (!head->next || list_empty(head))
>>>>> goto unlock;
>>>>> list_for_each_safe(pos, n, head) {
>>>>> - WRITE_ONCE(container_of(pos,
>>>>> - struct bpf_list_node_kern, list_head)->owner,
>>>>> NULL);
>>>>> + struct bpf_list_node_kern *node;
>>>>> +
>>>>> + node = container_of(pos, struct bpf_list_node_kern,
>>>>> list_head);
>>>>> + WRITE_ONCE(node->owner, BPF_PTR_POISON);
>>>>> list_move_tail(pos, &drain);
>>>>> }
>>>>> unlock:
>>>>> @@ -2272,8 +2274,12 @@ void bpf_list_head_free(const struct btf_field
>>>>> *field, void *list_head,
>>>>> __bpf_spin_unlock_irqrestore(spin_lock);
>>>>>
>>>>> while (!list_empty(&drain)) {
>>>>> + struct bpf_list_node_kern *node;
>>>>> +
>>>>> pos = drain.next;
>>>>> + node = container_of(pos, struct bpf_list_node_kern,
>>>>> list_head);
>>>>> list_del_init(pos);
>>>>> + WRITE_ONCE(node->owner, NULL);
>
> Is CPU allowed to reorder the stores in list_del_init() and WRITE_ONCE()?
> If it is, I think there is a race here.
Thanks for taking a close look at this. You are right that there is an
ordering issue here, but I don't think the specific sequence illustrated
by the example below is problematic.
> Thread #1:
> enter bpf_list_head_free()
> acquire H1 lock
> list_move_tail(pos, &drain); // reordered
> <-- ip here -->
> WRITE_ONCE(node->owner, BPF_PTR_POISON); // reordered
>
> Thread #2:
>
> acquire H1 lock
> n = bpf_refcount_acquire()
> release H1 lock
> acquire H2 lock
> enter __bpf_list_add()
> <-- ip here -->
> cmpxchg(&node->owner, NULL, BPF_PTR_POISON)
Even if the stores from list_move_tail(pos, &drain) become visible before
WRITE_ONCE(node->owner, BPF_PTR_POISON), node->owner is not NULL in that
window. Before the WRITE_ONCE(), it still points to H1. After the WRITE_ONCE(),
it is BPF_PTR_POISON. In both cases, __bpf_list_add() will fail:
cmpxchg(&node->owner, NULL, BPF_PTR_POISON)
because the old value is neither NULL nor expected to become NULL from this
part of bpf_list_head_free().
However, I agree that your original concern about the ordering between
list_del_init() and WRITE_ONCE(node->owner, NULL) is valid for the later
drain loop:
list_del_init(pos);
WRITE_ONCE(node->owner, NULL);
Here owner == NULL is the signal that the node can be inserted into another
list. Since WRITE_ONCE() does not provide release ordering, another CPU may
observe owner == NULL and successfully acquire the node in __bpf_list_add()
before the list_del_init() stores are visible. In that case __bpf_list_add()
can link the node into H2, and the delayed stores from list_del_init() may
then overwrite the node's list pointers and corrupt the H2 list.
So the fix should be to publish owner == NULL with release ordering after the
node has been fully unlinked, for example:
```
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2279,7 +2279,8 @@ void bpf_list_head_free(const struct btf_field *field,
void *list_head,
pos = drain.next;
node = container_of(pos, struct bpf_list_node_kern, list_head);
list_del_init(pos);
- WRITE_ONCE(node->owner, NULL);
+ /* Ensure __bpf_list_add() sees the node as unlinked. */
+ smp_store_release(&node->owner, NULL);
/* The contained type can also have resources, including a
* bpf_list_head which needs to be freed.
*/
@@ -2607,7 +2608,8 @@ static struct bpf_list_node *__bpf_list_del(struct
bpf_list_head *head,
return NULL;
list_del_init(n);
- WRITE_ONCE(node->owner, NULL);
+ /* Ensure __bpf_list_add() sees the node as unlinked. */
+ smp_store_release(&node->owner, NULL);
return (struct bpf_list_node *)n;
}
```
The existing cmpxchg() in __bpf_list_add() is a successful RMW with return
value, so it is fully ordered and is sufficient on the acquire side.
--
Thanks
Kaitao Cheng