https://bugs.dpdk.org/show_bug.cgi?id=1394
Bug ID: 1394 Summary: vq_assert_lock__ fail in vhost_user_set_vring_addr during live migration with HW vDPA Product: DPDK Version: unspecified Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: critical Priority: Normal Component: vhost/virtio Assignee: dev@dpdk.org Reporter: yaj...@nvidia.com Target Milestone: --- DPDK version: v24.03-rc1 1. boot vDPA and qemu(8.1) build/examples/dpdk-vdpa -a 5e:00.2,class=vdpa --file-prefix vf0 --log-level=.,info -- --client --iface /tmp/vfe-net <qemu:arg value='-chardev'/> <qemu:arg value='socket,id=charnet1,path=/tmp/vfe-net0,server'/> <qemu:arg value='-netdev'/> <qemu:arg value='vhost-user,chardev=charnet1,queues=2,id=hostnet1'/> <qemu:arg value='-device'/> <qemu:arg value='virtio-net-pci,mq=on,vectors=6,netdev=hostnet1,id=net1,mac=00:00:00:00:33:00,bus=pci.0,addr=0x8,page-per-vq=on'/> 2. live migration VM to another server sudo virsh migrate --verbose --live --persistent gen-l-vrt-295-005-CentOS-7.4 qemu+ssh://gen-l-vrt-294/system --unsafe 3. dpdk crash After device configured, vhost_user_lock_all_queue_pairs won't be called. The vq_assert_lock__ failed in vhost_user_set_vring_addr for vDPA case. related to commit: commit 741dc052eaf9459cc576b0d87e96a40069485c32 (HEAD) Author: David Marchand david.march...@redhat.com Date: Tue Dec 5 10:45:34 2023 +0100 vhost: annotate virtqueue access checks Modifying vq->access_ok should be done with a write lock taken. Annotate vring_translate() and vring_invalidate(). new port /tmp/vfe-net0, device : 5e:00.2 mlx5_vdpa: MTU cannot be set on device 5e:00.2. mlx5_vdpa: Region 0: HVA 0x7fff00000000, GPA 0x0, size 0xc0000000. mlx5_vdpa: Region 1: HVA 0x7ffdc0000000, GPA 0x100000000, size 0x140000000. mlx5_vdpa: Indirect mkey mode is KLM Fixed Buffer Size. mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 0. mlx5_vdpa: Virtq 0 notifier state is enabled. mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 1. mlx5_vdpa: Virtq 1 notifier state is enabled. [New Thread 0x7fffe7b9b400 (LWP 962699)] mlx5_vdpa: vDPA device 0 was configured. VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 0 VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 1 VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 2 mlx5_vdpa: Update virtq 2 status disable -> enable. mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 2. VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 3 mlx5_vdpa: Update virtq 3 status disable -> enable. mlx5_vdpa: Virtq 2 notifier state is enabled. mlx5_vdpa: vid 0: Init last_avail_idx=0, last_used_idx=0 for virtq 3. mlx5_vdpa: Virtq 3 notifier state is enabled. VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_LOG_BASE VHOST_CONFIG: (/tmp/vfe-net0) log mmap size: 294912, offset: 0 VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_FEATURES VHOST_CONFIG: (/tmp/vfe-net0) negotiated Virtio features: 0x144601803 mlx5_vdpa: mlx5 vdpa: enabling dirty logging... VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_GET_FEATURES VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_GET_STATUS VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ADDR EAL: PANIC in vq_assert_lock__(): VHOST_CONFIG: (/tmp/vfe-net0) vhost_user_set_vring_addr() called without access lock taken. 0: /images/vdpa/dpdk/build/examples/dpdk-vdpa (rte_dump_stack+0x1f) [aadfca] 1: /images/vdpa/dpdk/build/examples/dpdk-vdpa (__rte_panic+0xe2) [a8032b] 2: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x3bac83) [7bac83] 3: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x3bd0ef) [7bd0ef] 4: /images/vdpa/dpdk/build/examples/dpdk-vdpa (vhost_user_msg_handler+0x508) [7c264f] 5: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x36b797) [76b797] 6: /images/vdpa/dpdk/build/examples/dpdk-vdpa (fdset_event_dispatch+0x1cd) [769793] 7: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x6970ca) [a970ca] 8: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x6af41c) [aaf41c] 9: /lib64/libpthread.so.0 (7ffff6426000+0x814a) [7ffff642e14a] 10: /lib64/libc.so.6 (clone+0x43) [7ffff615ddc3] Thread 29 "dpdk-vhost-evt" received signal SIGABRT, Aborted. [Switching to Thread 0x7fffe839c400 (LWP 962487)] __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 return ret; Missing separate debuginfos, use: yum debuginfo-install elfutils-libelf-0.182-3.el8.x86_64 libgcc-8.4.1-1.el8.x86_64 libibverbs-2307mlnx47-1.2310007.x86_64 libnl3-3.5.0-1.el8.x86_64 libpcap-1.9.1-5.el8.x86_64 numactl-libs-2.0.12-11.el8.x86_64 openssl-libs-1.1.1k-5.el8_5.x86_64 zlib-1.2.11-17.el8.x86_64 (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007ffff6082db5 in __GI_abort () at abort.c:79 #2 0x0000000000a80330 in __rte_panic (funcname=0x3461330 <__func__.33032> "vq_assert_lock__", format=0x345eb90 "VHOST_CONFIG: (%s) %s() called without access lock taken.\n%.0s") at ../lib/eal/common/eal_common_debug.c:26 #3 0x00000000007bac83 in vq_assert_lock__ (dev=0x17ffc2500, vq=0x17ffa2280, func=0x3461370 <__func__.33676> "vhost_user_set_vring_addr") at ../lib/vhost/vhost.h:547 #4 0x00000000007bd0ef in vhost_user_set_vring_addr (pdev=0x7fffe8399270, ctx=0x7fffe8398fc0, main_fd=115) at ../lib/vhost/vhost_user.c:930 #5 0x00000000007c264f in vhost_user_msg_handler (vid=0, fd=115) at ../lib/vhost/vhost_user.c:3197 #6 0x000000000076b797 in vhost_user_read_cb (connfd=115, dat=0x7fffe0000dd0, remove=0x7fffe8399354) at ../lib/vhost/socket.c:318 #7 0x0000000000769793 in fdset_event_dispatch (arg=0x3cb3940 <vhost_user+8192>) at ../lib/vhost/fd_man.c:282 #8 0x0000000000a970ca in control_thread_start (arg=0x9a5ed30) at ../lib/eal/common/eal_common_thread.c:282 #9 0x0000000000aaf41c in thread_start_wrapper (arg=0x7fffffffe090) at ../lib/eal/unix/rte_thread.c:114 #10 0x00007ffff642e14a in start_thread (arg=<optimized out>) at pthread_create.c:479 #11 0x00007ffff615ddc3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) -- You are receiving this mail because: You are the assignee for the bug.