Hi Yuanha and all, Thanks for review. I change the title and let more people involved.
The issues can be reproduced easily with OVS. The steps are below: 1) virsh start vm 2) ovs-vsctl add port 3) delete the port via ovs-vsctl del-port. 4) shutdown VM via vrish destroy. I run a script to repeatly do such job for 2 VM, and it needs about 1 hour to reproduce it. However, I found there are more issues (I can reproduce it): 1) see below code static void vhost_user_read_cb(int connfd, void *dat, int *remove) { struct vhost_user_connection *conn = dat; struct vhost_user_socket *vsocket = conn->vsocket; int ret; ret = vhost_user_msg_handler(conn->vid, connfd); if (ret < 0) { close(connfd); *remove = 1; vhost_destroy_device(conn->vid); pthread_mutex_lock(&vsocket->conn_mutex); TAILQ_REMOVE(&vsocket->conn_list, conn, next); pthread_mutex_unlock(&vsocket->conn_mutex); free(conn); -------------------------> [Zhike Wang] we can see that the vsocket does not exist in conn_list or reconn_list at this short time slot. So in this time, if rte_vhost_driver_unregister() is called, vsocket is freed. So below vhost_user_create_client() may use raw pointer, and lead to crash. if (vsocket->reconnect) vhost_user_create_client(vsocket); } } 2) regarding the above issue, I think we need one mutex to lock conn_list and reconn_list, so the vsocket can in either conn_list or reconn_list, not both or none. So I use vhost_user.mutex() diff --git a/dpdk-16.11.3/lib/librte_vhost/fd_man.c b/dpdk-16.11.3/lib/librte_vhost/fd_man.c @@ -233,6 +234,7 @@ poll(pfdset->rwfds, numfds, 1000 /* millisecs */); need_shrink = 0; + pthread_mutex_lock(&vhost_user.mutex); for (i = 0; i < numfds; i++) { pthread_mutex_lock(&pfdset->fd_mutex); @@ -264,7 +266,10 @@ rcb(fd, dat, &remove1); if (wcb && pfd->revents & (POLLOUT | FDPOLLERR)) wcb(fd, dat, &remove2); + + pthread_mutex_lock(&pfdset->fd_mutex); pfdentry->busy = 0; + pthread_mutex_unlock(&pfdset->fd_mutex); /* * fdset_del needs to check busy flag. * We don't allow fdset_del to be called in callback @@ -285,5 +290,6 @@ if (need_shrink) fdset_shrink(pfdset); + pthread_mutex_unlock(&vhost_user.mutex); @@ -390,6 +395,8 @@ struct vhost_user_reconnect_list { struct vhost_user_reconnect *reconn, *next; while (1) { + + pthread_mutex_lock(&vhost_user.mutex); pthread_mutex_lock(&reconn_list.mutex); /* @@ -417,11 +424,18 @@ struct vhost_user_reconnect_list { "%s: connected\n", reconn->vsocket->path); vhost_user_add_connection(reconn->fd, reconn->vsocket); remove_fd: pthread_mutex_unlock(&reconn_list.mutex); + pthread_mutex_unlock(&vhost_user.mutex); sleep(1); } Then I met a deadlock (another thread enters rte_vhost_driver_unregister, and waiting for vhost_user.mutex.) (gdb) bt #0 0x00007febfaf8ebcd in poll () from /lib64/libc.so.6 #1 0x00007fec029b6836 in poll (__timeout=<optimized out>, __nfds=2, __fds=0x7febe80008c0) at /usr/include/bits/poll2.h:46 #2 time_poll (pollfds=pollfds@entry=0x7febe80008c0, n_pollfds=2, handles=handles@entry=0x0, timeout_when=4098990819, elapsed=elapsed@entry=0x7febf7450290) at lib/timeval.c:305 #3 0x00007fec0299afea in poll_block () at lib/poll-loop.c:364 #4 0x00007fec0297da25 in ovsrcu_synchronize () at lib/ovs-rcu.c:231 #5 0x00007fec029e9477 in destroy_device (vid=<optimized out>) at lib/netdev-dpdk.c:2651 #6 0x00007fec0195db04 in vhost_destroy_device (vid=<optimized out>) at /lib/librte_vhost/vhost.c:277 #7 0x00007fec0195cbdc in vhost_user_read_cb (connfd=<optimized out>, dat=0x7febc80008c0, remove=0x7febf7451420) at /lib/librte_vhost/socket.c:269 #8 0x00007fec0195c200 in fdset_event_dispatch (pfdset=pfdset@entry=0x7fec01b70180 <vhost_user+8192>) at /lib/librte_vhost/fd_man.c:266 #9 0x00007fec0195d647 in rte_vhost_driver_session_start () at /lib/librte_vhost/socket.c:714 #10 0x00007fec029e661b in start_vhost_loop (dummy=<optimized out>) at lib/netdev-dpdk.c:2736 #11 0x00007fec0297f796 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:348 #12 0x00007febfb9b6dc5 in start_thread () from /lib64/libpthread.so.0 #13 0x00007febfaf9921d in clone () from /lib64/libc.so.6 (gdb) n So now I believe it is hard to fix the issues insides the modules, and I would like to present identified issues, and want to hear your proposal or fix. Br, Zhike Wang -----Original Message----- From: Yuanhan Liu [mailto:y...@fridaylinux.org] Sent: Thursday, January 18, 2018 10:04 PM To: 王志克 Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH v3] lib/librte_vhost: move fdset_del out of conn_mutex Hi, Apologize for late review. On Tue, Jan 02, 2018 at 02:08:36AM -0800, zhike wang wrote: > From: wang zhike <wangzh...@jd.com> > > v3: > * Fix duplicate variable name, which leads to unexpected memory write. > v2: > * Move fdset_del before conn destroy. > * Fix coding style. Note that we prefer to put the change logs after "---" below Signed-off-by, so that those change logs won't be tracked in the git log history. > This patch fixes below race condition: > 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex > ->fdset_del->loop to check fd.busy. > 2. another thread calls fdset_event_dispatch, and the busy flag is > changed AFTER handling on the fd, i.e, rcb(). However, the rcb, > such as vhost_user_read_cb() would try to retrieve the conn_mutex. > > So issue is that the 1st thread will loop check the flag while holding > the mutex, while the 2nd thread would be blocked by mutex and can not > change the flag. Then dead lock is observed. I then would change the title to "vhost: fix deadlock". I'm also keen to know how do you reproduce this issue with real-life APP (say ovs) and how easy it is for reproduce. > Signed-off-by: zhike wang <wangzh...@jd.com> Again, you need fix your git config file about your name. > --- > lib/librte_vhost/socket.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c > index 422da00..ea01327 100644 > --- a/lib/librte_vhost/socket.c > +++ b/lib/librte_vhost/socket.c > @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list { > struct vhost_user_socket *vsocket = vhost_user.vsockets[i]; > > if (!strcmp(vsocket->path, path)) { > + int del_fds[MAX_FDS]; > + int num_of_fds = 0, fd_index; > + I think the naming could be a bit shorter, like "fds, nr_fds (or nb_fds), fd_idx". > if (vsocket->is_server) { > fdset_del(&vhost_user.fdset, > vsocket->socket_fd); > close(vsocket->socket_fd); > @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list { > vhost_user_remove_reconnect(vsocket); > } > > + /* fdset_del() must be called without conn_mutex. */ > + pthread_mutex_lock(&vsocket->conn_mutex); > + for (conn = TAILQ_FIRST(&vsocket->conn_list); > + conn != NULL; > + conn = next) { > + next = TAILQ_NEXT(conn, next); > + > + del_fds[num_of_fds++] = conn->connfd; > + } > + pthread_mutex_unlock(&vsocket->conn_mutex); > + > + for (fd_index = 0; fd_index < num_of_fds; fd_index++) > + fdset_del(&vhost_user.fdset, del_fds[fd_index]); > + > pthread_mutex_lock(&vsocket->conn_mutex); > for (conn = TAILQ_FIRST(&vsocket->conn_list); > conn != NULL; > conn = next) { > next = TAILQ_NEXT(conn, next); > > - fdset_del(&vhost_user.fdset, conn->connfd); If you log the fd here and invoke fdset_del() and close() after the loop, you then could avoid one extra loop as you did above. --yliu > RTE_LOG(INFO, VHOST_CONFIG, > "free connfd = %d for device '%s'\n", > conn->connfd, path); > -- > 1.8.3.1