On Mon, Sep 05, 2016 at 03:06:09PM +0200, Maxime Coquelin wrote: > > > On 09/02/2016 07:29 PM, Michael S. Tsirkin wrote: > > On Fri, Sep 02, 2016 at 10:57:17AM +0200, Maxime Coquelin wrote: > > > > > > > > > On 09/01/2016 03:46 PM, Michael S. Tsirkin wrote: > > > > On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote: > > > > > > > > > > > > > > > On 08/14/2016 11:42 AM, Prerna Saxena wrote: > > > > > > On 14/08/16 8:21 am, "Michael S. Tsirkin" <m...@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote: > > > > > > > > > > > > > > > > On 12/08/16 12:08 pm, "Fam Zheng" <f...@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, 08/10 18:30, Michael S. Tsirkin wrote: > > > > > > > > > > From: Prerna Saxena <prerna.sax...@nutanix.com> > > > > > > > > > > > > > > > > > > > > The set_mem_table command currently does not seek a reply. > > > > > > > > > > Hence, there is > > > > > > > > > > no easy way for a remote application to notify to QEMU when > > > > > > > > > > it finished > > > > > > > > > > setting up memory, or if there were errors doing so. > > > > > > > > > > > > > > > > > > > > As an example: > > > > > > > > > > (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a > > > > > > > > > > vhost-user net > > > > > > > > > > application). SET_MEM_TABLE does not require a reply > > > > > > > > > > according to the spec. > > > > > > > > > > (2) Qemu commits the memory to the guest. > > > > > > > > > > (3) Guest issues an I/O operation over a new memory region > > > > > > > > > > which was configured on (1). > > > > > > > > > > (4) The application has not yet remapped the memory, but it > > > > > > > > > > sees the I/O request. > > > > > > > > > > (5) The application cannot satisfy the request because it > > > > > > > > > > does not know about those GPAs. > > > > > > > > > > > > > > > > > > > > While a guaranteed fix would require a protocol extension > > > > > > > > > > (committed separately), > > > > > > > > > > a best-effort workaround for existing applications is to > > > > > > > > > > send a GET_FEATURES > > > > > > > > > > message before completing the vhost_user_set_mem_table() > > > > > > > > > > call. > > > > > > > > > > Since GET_FEATURES requires a reply, an application that > > > > > > > > > > processes vhost-user > > > > > > > > > > messages synchronously would probably have completed the > > > > > > > > > > SET_MEM_TABLE before replying. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com> > > > > > > > > > > Reviewed-by: Michael S. Tsirkin <m...@redhat.com> > > > > > > > > > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > > > > > > > > > > > > > > > > > > Sporadic hangs are seen with test-vhost-user after this patch: > > > > > > > > > > > > > > > > > > https://travis-ci.org/qemu/qemu/builds > > > > > > > > > > > > > > > > > > Reverting seems to fix it for me. > > > > > > > > > > > > > > > > > > Is this a known problem? > > > > > > > > > > > > > > > > > > Fam > > > > > > > > > > > > > > > > Hi Fam, > > > > > > > > Thanks for reporting the sporadic hangs. I had seen ‘make > > > > > > > > check’ pass on my Centos 6 environment, so missed this. > > > > > > > > I am setting up the docker test env to repro this, but I think > > > > > > > > I can guess the problem : > > > > > > > > > > > > > > > > In tests/vhost-user-test.c: > > > > > > > > > > > > > > > > static void chr_read(void *opaque, const uint8_t *buf, int size) > > > > > > > > { > > > > > > > > ..[snip].. > > > > > > > > > > > > > > > > case VHOST_USER_SET_MEM_TABLE: > > > > > > > > /* received the mem table */ > > > > > > > > memcpy(&s->memory, &msg.payload.memory, > > > > > > > > sizeof(msg.payload.memory)); > > > > > > > > s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, > > > > > > > > G_N_ELEMENTS(s->fds)); > > > > > > > > > > > > > > > > > > > > > > > > /* signal the test that it can continue */ > > > > > > > > g_cond_signal(&s->data_cond); > > > > > > > > break; > > > > > > > > ..[snip].. > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > The test seems to be marked complete as soon as mem_table is > > > > > > > > copied. > > > > > > > > However, this patch 3/3 changes the behaviour of the > > > > > > > > SET_MEM_TABLE vhost command implementation with qemu. > > > > > > > > SET_MEM_TABLE now sends out a new message GET_FEATURES, and the > > > > > > > > call is only completed once it receives features from the > > > > > > > > remote application. (or the test framework, as is the case > > > > > > > > here.) > > > > > > > > > > > > > > Hmm but why does it matter that data_cond is woken up? > > > > > > > > > > > > Michael, sorry, I didn’t quite understand that. Could you pls > > > > > > explain ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > While the test itself can be modified (Do not signal completion > > > > > > > > until we’ve sent a follow-up response to GET_FEATURES), I am > > > > > > > > now wondering if this patch may break existing vhost > > > > > > > > applications too ? If so, reverting it possibly better. > > > > > > > > > > > > > > What bothers me is that the new feature might cause the same > > > > > > > issue once we enable it in the test. > > > > > > > > > > > > No it wont. The new feature is a protocol extension, and only works > > > > > > if it has been negotiated with. If not negotiated, that part of > > > > > > code is never executed. > > > > > > > > > > > > > > > > > > > > How about a patch to tests/vhost-user-test.c adding the new > > > > > > > protocol feature? I would be quite interested to see what > > > > > > > is going on with it. > > > > > > > > > > > > Yes that can be done. But you can see that the protocol extension > > > > > > patch will not change the behaviour of the _existing_ test. > > > > > > > > > > > > > > > > > > > > > > > > > > > > What confuses me is why it doesn’t fail all the time, but only > > > > > > > > about 20% to 30% time as Fam reports. > > > > > > > > > > > > > > And succeeds every time on my systems :( > > > > > > > > > > > > +1 to that :( I have had no luck repro’ing it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thoughts : Michael, Fam, MarcAndre ? > > > > > > > > > > I have managed to reproduce the hang by adding some debug prints into > > > > > vhost_user_get_features(). > > > > > > > > > > Doing this the issue is reproducible quite easily. > > > > > Another way to reproduce it in one shot is to strace (with following > > > > > forks option) vhost-user-test execution. > > > > > > > > > > So, by adding debug prints at vhost_user_get_features() entry and > > > > > exit, > > > > > we can see we never return from this function when hang happens. > > > > > Strace of Qemu instance shows that its thread keeps retrying to > > > > > receive > > > > > GET_FEATURE reply: > > > > > > > > > > write(1, "vhost_user_get_features IN: \n", 29) = 29 > > > > > sendmsg(11, {msg_name=NULL, msg_namelen=0, > > > > > msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}], > > > > > msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12 > > > > > recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN > > > > > nanosleep({0, 100000}, 0x7fff29f8dd70) = 0 > > > > > ... > > > > > recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN > > > > > nanosleep({0, 100000}, 0x7fff29f8dd70) = 0 > > > > > > > > > > The reason is that vhost-user-test never replies to Qemu, > > > > > because its thread handling the GET_FEATURES command is waiting for > > > > > the s->data_mutex lock. > > > > > This lock is held by the other vhost-user-test thread, executing > > > > > read_guest_mem(). > > > > > > > > > > The lock is never released because the thread is blocked in read > > > > > syscall, when read_guest_mem() is doing the readl(). > > > > > > > > > > This is because on Qemu side, the thread polling the qtest socket is > > > > > waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but > > > > > the mutex is held by the thread trying to get the GET_FEATURE reply > > > > > (the TCG one). > > > > > > > > > > So here is the deadlock. > > > > > > > > > > That said, I don't see a clean way to solve this. > > > > > Any thoughts? > > > > > > > > > > Regards, > > > > > Maxime > > > > > > > > My thought is that we really need to do what I said: > > > > avoid doing GET_FEATURES (and setting reply_ack) > > > > on the first set_mem, and I quote: > > > > > > > > OK this all looks very reasonable (and I do like patch 1 too) > > > > but there's one source of waste here: we do not need to > > > > synchronize when we set up device the first time > > > > when hdev->memory_changed is false. > > > > > > > > I think we should test that and skip synch in both patches > > > > unless hdev->memory_changed is set. > > > > > > > > with that change test will start passing. > > > > > > Actually, it looks like memory_changed is true even at first > > > SET_MEM_TABLE request. > > > > > > Thanks, > > > Maxime > > > > Let's add another flag then? What we care about is that it's not > > the first time set specify translations for a given address. > > I added a dedicated flag, that skips sync on two conditions: > 1. First set_mem_table call > 2. If only a new regions are added > > It solves the hang seen with vhost-user-test app, and I think the patch > makes sense. > > But IMHO the problem is deeper than that, and could under some > conditions still hang when running in TCG mode. > Imagine Qemu sends a random "GET_FEATURE" request after the > set_mem_table, and vhost-user-test read_guest_mem() is executed just before > this second call (Let's say it was not scheduled for some time). > > In this case, read_guest_mem() thread owns the data_mutex, and start > doing readl() calls. On Qemu side, as we are sending an update of the > mem table, we own the qemu_global_mutex, and the deadlock happen again: > - Vhost-user-test > * read_guest_mem() thread: Blocked in readl(), waiting for Qemu to > handle it (TCG mode only), owning the data_mutex lock. > * Command handler thread: Received GET_FEATURE event, but wait for > data_mutex ownership to handle it. > > - Qemu > * FDs polling thread: Wait for qemu_global_mutex ownership, to be > able to handle the readl() request from vhost-user-test. > * TCG thread: Own the qemu_global_mutex, and poll to receive the > GET_FEATURE reply. > > Maybe the GET_FEATURE case is not realistic, but what about > GET_VRING_BASE, that get called by vhost_net_stop()? > > Thanks in advance, > Maxime
I think most applications expect to be able to handle GET_VRING_BASE at any time. If application isn't ready to handle GET_VRING_BASE at any time, we won't be able to stop the device. This is not the case for this test but that's because it's just a unit test, so incomplete. -- MST