On 4/3/25 14:24, Paolo Abeni wrote:
On 4/2/25 7:42 AM, Bui Quang Minh wrote:
When setting up XDP for a running interface, we call napi_disable() on
the receive queue's napi. In delayed refill_work, it also calls
napi_disable() on the receive queue's napi. This can leads to deadlock
when napi_disable() is called on an already disabled napi. This commit
fixes this by disabling future and cancelling all inflight delayed
refill works before calling napi_disabled() in virtnet_xdp_set.

Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set")
Signed-off-by: Bui Quang Minh <minhquangbu...@gmail.com>
---
  drivers/net/virtio_net.c | 12 ++++++++++++
  1 file changed, 12 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7e4617216a4b..33406d59efe2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -5956,6 +5956,15 @@ static int virtnet_xdp_set(struct net_device *dev, 
struct bpf_prog *prog,
        if (!prog && !old_prog)
                return 0;
+ /*
+        * Make sure refill_work does not run concurrently to
+        * avoid napi_disable race which leads to deadlock.
+        */
+       if (netif_running(dev)) {
+               disable_delayed_refill(vi);
+               cancel_delayed_work_sync(&vi->refill);
AFAICS at this point refill_work() could still be running, why don't you
need to call flush_delayed_work()?

AFAIK, the cancel_delayed_work_sync (this is a synchronous version) provides somewhat stronger guarantee than the flush_delayed_work. Internally, the cancel_delayed_work_sync will also call to __flush_work. The cancel_delayed_work_sync temporarily disables the work before calling __flush_work, so that even if refill_work tries to re-queue itself, that re-queue will fail. As the refill_work can actually re-queue itself, I think we must use cancel_delayed_work_sync here.

Thanks,
Quang Minh.


Reply via email to