I am very sorry that my previous description was not accurate. Below I will describe the steps to reproduce this problem and my analysis in detail.The conditions for reproducing this problem are quite demanding. First, let me introduce my environment. I use DPDK vdpa to drive DPU to implement a vhost-user type network card. DPDK vdpa software communicates with QEMU through the vhost protocol. The steps to reproduce are as follows:
1. Start a Windows virtual machine. 2. Use the vish command to hotplug and unplug a 32 queues vhost-user-net network card. 3. After a while, QEMU will crash. I added some logs to locate this problem. The following is how I located the root cause of this problem based on logs and code analysis. The steps to reproduce the problem involve hot plugging and hot unplugging of network cards. Hot plugging of network cards involves two processes. Process A is to insert the device into qemu, and process B is the process of guest operating system initializing the network card. When process A is completed, the virsh attach-device command returns. At this time, you can call virsh detach-device to perform hot unplug operations. Generally, this will not be a problem because the network card initializes very quickly. However, since my network card is a vhost-user type network card, and it is implemented by DPDK vdpa, there is an asynchronous situation. When the Guest operating system is initializing the network card, some messages from vhost-user returned to QEMU may be delayed. The problem occurs here. For example, the message VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG is responsible for assigning the value of VhostUserHostNotifier->addr, which is the hot plug process B. There are also two processes for hot unplugging devices. libvirt will send two QMP commands to qemu: one is device_del and the other is netdev_del. If this message arrives after the first hot unplug command, there will be problems. The following is a detailed analysis: The key function of the device_del command execution: virtio_set_status->vhost_dev_stop. In the vhost_dev_stop function, all queues will be traversed and vhost_user_get_vring_base-->vhost_user_host_notifier_remove will be called because VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG has not arrived in some queues at this time, so VhostUserHostNotifier->addr has no value at this time, so nothing will be done at this time. For those with values, vhost_user_host_notifier_free will be submitted to the rcu thread and clear n->addr. Next, libvirt will send the netdev_del QMP command. netdev_del--> vhost_user_cleanup Because some queues VhostUserHostNotifier->addr are not empty at this time, there will be the following call path: 1. vhost_user_host_notifier_remove->call_rcu(n, vhost_user_host_notifier_free, rcu); 2. g_free_rcu(n, rcu); The same rcu was submitted twice. In call_rcu1, if two identical nodes are added to the node list, only one will be successfully added, but rcu_call_count will be added twice. When rcu processes rcu node, it will check whether rcu_call_count is empty. If not, it will take the node from node list. Because the actual node in node list is one less than rcu_call_count, it will cause rcu_call_count to be not empty, but there is no node in node list. In this way, rcu thread will abort, the code is as follows: if (head == &dummy && qatomic_mb_read(&tail) == &dummy.next) { abort(); } This is the reason why QEMU crashed. Since the latest QEMU cannot run in our environment due to incompatibility of vhost-user messages, and this problem is difficult to reproduce, I was unable to reproduce the problem on the latest qemu. However, from the upstream code, since VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG is asynchronous, we cannot assume that all VhostUserHostNotifier->addr in vhost_user_cleanup are empty. If addr is not empty, there will be a problem. Otherwise, vhost_user_host_notifier_remove is not necessary to call in vhost_user_cleanup. Therefore, the two tasks of vhost_user_host_notifier_free and free VhostUserHostNotifier need to be placed in a rcu node, and use n->need_free to determine whether it is free VhostUserHostNotifier. Stefano Garzarella <sgarz...@redhat.com> 于2024年8月27日周二 16:00写道: > > On Wed, Aug 07, 2024 at 05:55:08PM GMT, yaozhenguo wrote: > >When hotplug and hotunplug vhost-user-net device quickly. > > I'd replace the . with , > > >qemu will crash. BT is as below: > > > >0 __pthread_kill_implementation () at /usr/lib64/libc.so.6 > >1 raise () at /usr/lib64/libc.so.6 > >2 abort () at /usr/lib64/libc.so.6 > >3 try_dequeue () at ../util/rcu.c:235 > >4 call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:288 > >5 qemu_thread_start (args=0x55b10d9ceaa0) at ../util/qemu-thread-posix.c:541 > >6 start_thread () at /usr/lib64/libc.so.6 > >7 clone3 () at /usr/lib64/libc.so.6 > > > >1. device_del qmp process > > > >virtio_set_status > > vhost_dev_stop > > vhost_user_get_vring_base > > vhost_user_host_notifier_remove > > > >vhost_user_slave_handle_vring_host_notifier maybe called asynchronous after > ^ > Now it's called vhost_user_backend_handle_vring_host_notifier, I'd > suggest to use the new name. > > >vhost_user_host_notifier_remove. vhost_user_host_notifier_remove will > >not > >all call_rcu because of notifier->addr is NULL at this time. > > s/all/call ? > Yes > > > >2. netdev_del qmp process > > > >vhost_user_cleanup > > vhost_user_host_notifier_remove > > g_free_rcu > > > >vhost_user_host_notifier_remove and g_free_rcu will sumbit same rcu_head > > s/sumbit/submit > Sorry about this mistake > >to rcu node list. rcu_call_count add twice but only one node is added. > >rcu thread will abort when calling try_dequeue with node list is empty. > > What's not clear to me is how 1 and 2 are related, could you explain > that? > Libvirt will send two QMP commands in succession when the network card is hot-unplugged, which can help understand the problem. However, I did not describe it clearly. Please see the detailed description at the top. > >Fix this by moving g_free(n) to vhost_user_host_notifier_free. > >` > >Fixes: 503e355465 ("virtio/vhost-user: dynamically assign > >VhostUserHostNotifiers") > >Signed-off-by: yaozhenguo <yaozhen...@jd.com> > >--- > > hw/virtio/vhost-user.c | 23 +++++++++++------------ > > include/hw/virtio/vhost-user.h | 1 + > > 2 files changed, 12 insertions(+), 12 deletions(-) > > > >diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > >index 00561daa06..7ab37c0da2 100644 > >--- a/hw/virtio/vhost-user.c > >+++ b/hw/virtio/vhost-user.c > >@@ -1188,6 +1188,12 @@ static void > >vhost_user_host_notifier_free(VhostUserHostNotifier *n) > > assert(n && n->unmap_addr); > > munmap(n->unmap_addr, qemu_real_host_page_size()); > > n->unmap_addr = NULL; > >+ if (n->need_free) { > >+ memory_region_transaction_begin(); > >+ object_unparent(OBJECT(&n->mr)); > >+ memory_region_transaction_commit(); > >+ g_free(n); > >+ } > > } > > > > /* > >@@ -1195,7 +1201,7 @@ static void > >vhost_user_host_notifier_free(VhostUserHostNotifier *n) > > * under rcu. > > */ > > static void vhost_user_host_notifier_remove(VhostUserHostNotifier *n, > >- VirtIODevice *vdev) > >+ VirtIODevice *vdev, bool free) > > { > > if (n->addr) { > > if (vdev) { > >@@ -1204,6 +1210,7 @@ static void > >vhost_user_host_notifier_remove(VhostUserHostNotifier *n, > > assert(!n->unmap_addr); > > n->unmap_addr = n->addr; > > n->addr = NULL; > >+ n->need_free = free; > > call_rcu(n, vhost_user_host_notifier_free, rcu); > > } > > } > >@@ -1280,7 +1287,7 @@ static int vhost_user_get_vring_base(struct vhost_dev > >*dev, > > > > VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index); > > if (n) { > >- vhost_user_host_notifier_remove(n, dev->vdev); > >+ vhost_user_host_notifier_remove(n, dev->vdev, false); > > } > > > > ret = vhost_user_write(dev, &msg, NULL, 0); > >@@ -1562,7 +1569,7 @@ static int > >vhost_user_backend_handle_vring_host_notifier(struct vhost_dev *dev, > > * new mapped address. > > */ > > n = fetch_or_create_notifier(user, queue_idx); > >- vhost_user_host_notifier_remove(n, vdev); > >+ vhost_user_host_notifier_remove(n, vdev, false); > > > > if (area->u64 & VHOST_USER_VRING_NOFD_MASK) { > > return 0; > >@@ -2737,13 +2744,7 @@ static void vhost_user_state_destroy(gpointer data) > > { > > VhostUserHostNotifier *n = (VhostUserHostNotifier *) data; > > if (n) { > >- vhost_user_host_notifier_remove(n, NULL); > >- object_unparent(OBJECT(&n->mr)); > >- /* > >- * We can't free until vhost_user_host_notifier_remove has > >- * done it's thing so schedule the free with RCU. > >- */ > >- g_free_rcu(n, rcu); > >+ vhost_user_host_notifier_remove(n, NULL, true); > > I'm not sure I understand the problem well, but could it be that now we > don't see the problem anymore, but we have a memory leak? > > Here for example could it be the case that `n->addr` is NULL and > therefore `vhost_user_host_notifier_free` with `n->need_free = true` > will never be submitted? > Please see the detailed description at the top. > > } > > } > > > >@@ -2765,9 +2766,7 @@ void vhost_user_cleanup(VhostUserState *user) > > if (!user->chr) { > > return; > > } > >- memory_region_transaction_begin(); > > user->notifiers = (GPtrArray *) g_ptr_array_free(user->notifiers, > > true); > >- memory_region_transaction_commit(); > > This is no longer necessary, because the `user->notifiers` free function > no longer calls `object_unparent(OBJECT(&n->mr))`, right? > > Maybe it's worth mentioning in the commit description. > Yes, I originally thought so. But after reading the code carefully, I found that my understanding was not correct.It is still needed here, because the user->notifiers free function will call virtio_queue_set_host_notifier_mr. This should also require memory_region_transaction_commit. Is this correct? > > user->chr = NULL; > > } > > > >diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h > >index 324cd8663a..a171f29e0b 100644 > >--- a/include/hw/virtio/vhost-user.h > >+++ b/include/hw/virtio/vhost-user.h > >@@ -54,6 +54,7 @@ typedef struct VhostUserHostNotifier { > > void *addr; > > void *unmap_addr; > > int idx; > >+ bool need_free; > > } VhostUserHostNotifier; > > > > /** > >-- > >2.43.0 > > >