Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.

Michael S. Tsirkin Fri, 02 Sep 2016 10:31:44 -0700

On Fri, Sep 02, 2016 at 10:57:17AM +0200, Maxime Coquelin wrote:
> 
> 
> On 09/01/2016 03:46 PM, Michael S. Tsirkin wrote:
> > On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote:
> > > 
> > > 
> > > On 08/14/2016 11:42 AM, Prerna Saxena wrote:
> > > > On 14/08/16 8:21 am, "Michael S. Tsirkin" <m...@redhat.com> wrote:
> > > > 
> > > > 
> > > > > On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
> > > > > > 
> > > > > > On 12/08/16 12:08 pm, "Fam Zheng" <f...@redhat.com> wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> > > > > > > > From: Prerna Saxena <prerna.sax...@nutanix.com>
> > > > > > > > 
> > > > > > > > The set_mem_table command currently does not seek a reply. 
> > > > > > > > Hence, there is
> > > > > > > > no easy way for a remote application to notify to QEMU when it 
> > > > > > > > finished
> > > > > > > > setting up memory, or if there were errors doing so.
> > > > > > > > 
> > > > > > > > As an example:
> > > > > > > > (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user 
> > > > > > > > net
> > > > > > > > application). SET_MEM_TABLE does not require a reply according 
> > > > > > > > to the spec.
> > > > > > > > (2) Qemu commits the memory to the guest.
> > > > > > > > (3) Guest issues an I/O operation over a new memory region 
> > > > > > > > which was configured on (1).
> > > > > > > > (4) The application has not yet remapped the memory, but it 
> > > > > > > > sees the I/O request.
> > > > > > > > (5) The application cannot satisfy the request because it does 
> > > > > > > > not know about those GPAs.
> > > > > > > > 
> > > > > > > > While a guaranteed fix would require a protocol extension 
> > > > > > > > (committed separately),
> > > > > > > > a best-effort workaround for existing applications is to send a 
> > > > > > > > GET_FEATURES
> > > > > > > > message before completing the vhost_user_set_mem_table() call.
> > > > > > > > Since GET_FEATURES requires a reply, an application that 
> > > > > > > > processes vhost-user
> > > > > > > > messages synchronously would probably have completed the 
> > > > > > > > SET_MEM_TABLE before replying.
> > > > > > > > 
> > > > > > > > Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com>
> > > > > > > > Reviewed-by: Michael S. Tsirkin <m...@redhat.com>
> > > > > > > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> > > > > > > 
> > > > > > > Sporadic hangs are seen with test-vhost-user after this patch:
> > > > > > > 
> > > > > > > https://travis-ci.org/qemu/qemu/builds
> > > > > > > 
> > > > > > > Reverting seems to fix it for me.
> > > > > > > 
> > > > > > > Is this a known problem?
> > > > > > > 
> > > > > > > Fam
> > > > > > 
> > > > > > Hi Fam,
> > > > > > Thanks for reporting the sporadic hangs. I had seen ‘make check’ 
> > > > > > pass on my Centos 6 environment, so missed this.
> > > > > > I am setting up the docker test env to repro this, but I think I 
> > > > > > can guess the problem :
> > > > > > 
> > > > > > In tests/vhost-user-test.c:
> > > > > > 
> > > > > > static void chr_read(void *opaque, const uint8_t *buf, int size)
> > > > > > {
> > > > > > ..[snip]..
> > > > > > 
> > > > > >     case VHOST_USER_SET_MEM_TABLE:
> > > > > >        /* received the mem table */
> > > > > >        memcpy(&s->memory, &msg.payload.memory, 
> > > > > > sizeof(msg.payload.memory));
> > > > > >        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, 
> > > > > > G_N_ELEMENTS(s->fds));
> > > > > > 
> > > > > > 
> > > > > >        /* signal the test that it can continue */
> > > > > >        g_cond_signal(&s->data_cond);
> > > > > >        break;
> > > > > > ..[snip]..
> > > > > > }
> > > > > > 
> > > > > > 
> > > > > > The test seems to be marked complete as soon as mem_table is copied.
> > > > > > However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE 
> > > > > > vhost command implementation with qemu. SET_MEM_TABLE now sends out 
> > > > > > a new message GET_FEATURES, and the call is only completed once it 
> > > > > > receives features from the remote application. (or the test 
> > > > > > framework, as is the case here.)
> > > > > 
> > > > > Hmm but why does it matter that data_cond is woken up?
> > > > 
> > > > Michael, sorry, I didn’t quite understand that. Could you pls explain ?
> > > > 
> > > > > 
> > > > > 
> > > > > > While the test itself can be modified (Do not signal completion 
> > > > > > until we’ve sent a follow-up response to GET_FEATURES), I am now 
> > > > > > wondering if this patch may break existing vhost applications too ? 
> > > > > > If so, reverting it possibly better.
> > > > > 
> > > > > What bothers me is that the new feature might cause the same
> > > > > issue once we enable it in the test.
> > > > 
> > > > No it wont. The new feature is a protocol extension, and only works if 
> > > > it has been negotiated with. If not negotiated, that part of code is 
> > > > never executed.
> > > > 
> > > > > 
> > > > > How about a patch to tests/vhost-user-test.c adding the new
> > > > > protocol feature? I would be quite interested to see what
> > > > > is going on with it.
> > > > 
> > > > Yes that can be done. But you can see that the protocol extension patch 
> > > > will not change the behaviour of the _existing_ test.
> > > > 
> > > > > 
> > > > > 
> > > > > > What confuses me is why it doesn’t fail all the time, but only 
> > > > > > about 20% to 30% time as Fam reports.
> > > > > 
> > > > > And succeeds every time on my systems :(
> > > > 
> > > > +1 to that :( I have had no luck repro’ing it.
> > > > 
> > > > > 
> > > > > > 
> > > > > > Thoughts : Michael, Fam, MarcAndre ?
> > > 
> > > I have managed to reproduce the hang by adding some debug prints into
> > > vhost_user_get_features().
> > > 
> > > Doing this the issue is reproducible quite easily.
> > > Another way to reproduce it in one shot is to strace (with following
> > > forks option) vhost-user-test execution.
> > > 
> > > So, by adding debug prints at vhost_user_get_features() entry and exit,
> > > we can see we never return from this function when hang happens.
> > > Strace of Qemu instance shows that its thread keeps retrying to receive
> > > GET_FEATURE reply:
> > > 
> > > write(1, "vhost_user_get_features IN: \n", 29) = 29
> > > sendmsg(11, {msg_name=NULL, msg_namelen=0,
> > >         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
> > >         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
> > > recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> > > nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> > > ...
> > > recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> > > nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> > > 
> > > The reason is that vhost-user-test never replies to Qemu,
> > > because its thread handling the GET_FEATURES command is waiting for
> > > the s->data_mutex lock.
> > > This lock is held by the other vhost-user-test thread, executing
> > > read_guest_mem().
> > > 
> > > The lock is never released because the thread is blocked in read
> > > syscall, when read_guest_mem() is doing the readl().
> > > 
> > > This is because on Qemu side, the thread polling the qtest socket is
> > > waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
> > > the mutex is held by the thread trying to get the GET_FEATURE reply
> > > (the TCG one).
> > > 
> > > So here is the deadlock.
> > > 
> > > That said, I don't see a clean way to solve this.
> > > Any thoughts?
> > > 
> > > Regards,
> > > Maxime
> > 
> > My thought is that we really need to do what I said:
> > avoid doing GET_FEATURES (and setting reply_ack)
> > on the first set_mem, and I quote:
> > 
> >     OK this all looks very reasonable (and I do like patch 1 too)
> >     but there's one source of waste here: we do not need to
> >     synchronize when we set up device the first time
> >     when hdev->memory_changed is false.
> > 
> >     I think we should test that and skip synch in both patches
> >     unless  hdev->memory_changed is set.
> > 
> > with that change test will start passing.
> 
> Actually, it looks like memory_changed is true even at first
> SET_MEM_TABLE request.
> 
> Thanks,
> Maxime


Let's add another flag then? What we care about is that it's not
the first time set specify translations for a given address.

-- 
MST

Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.

Reply via email to