On Tue, Jan 17, 2017 at 06:53:17PM +0000, Felipe Franciosi wrote: > > > On 17 Jan 2017, at 10:41, Michael S. Tsirkin <m...@redhat.com> wrote: > > > > On Fri, Jan 13, 2017 at 10:29:46PM +0000, Felipe Franciosi wrote: > >> > >>> On 13 Jan 2017, at 10:18, Michael S. Tsirkin <m...@redhat.com> wrote: > >>> > >>> On Fri, Jan 13, 2017 at 05:15:22PM +0000, Felipe Franciosi wrote: > >>>> > >>>>> On 13 Jan 2017, at 09:04, Michael S. Tsirkin <m...@redhat.com> wrote: > >>>>> > >>>>> On Fri, Jan 13, 2017 at 03:09:46PM +0000, Felipe Franciosi wrote: > >>>>>> Hi Marc-Andre, > >>>>>> > >>>>>>> On 13 Jan 2017, at 07:03, Marc-André Lureau <mlur...@redhat.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>> Hi > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>>> Currently, VQs are started as soon as a SET_VRING_KICK is received. > >>>>>>>> That > >>>>>>>> is too early in the VQ setup process, as the backend might not yet > >>>>>>>> have > >>>>>>> > >>>>>>> I think we may want to reconsider queue_set_started(), move it > >>>>>>> elsewhere, since kick/call fds aren't mandatory to process the rings. > >>>>>> > >>>>>> Hmm. The fds aren't mandatory, but I imagine in that case we should > >>>>>> still receive SET_VRING_KICK/CALL messages without an fd (ie. with the > >>>>>> VHOST_MSG_VQ_NOFD_MASK flag set). Wouldn't that be the case? > >>>>> > >>>>> Please look at docs/specs/vhost-user.txt, Starting and stopping rings > >>>>> > >>>>> The spec says: > >>>>> Client must start ring upon receiving a kick (that is, > >>>>> detecting that > >>>>> file descriptor is readable) on the descriptor specified by > >>>>> VHOST_USER_SET_VRING_KICK, and stop ring upon receiving > >>>>> VHOST_USER_GET_VRING_BASE. > >>>> > >>>> Yes I have seen the spec, but there is a race with the current > >>>> libvhost-user code which needs attention. My initial proposal (which got > >>>> turned down) was to send a spurious notification upon seeing a callfd. > >>>> Then I came up with this proposal. See below. > >>>> > >>>>> > >>>>> > >>>>>>> > >>>>>>>> a callfd to notify in case it received a kick and fully processed the > >>>>>>>> request/command. This patch only starts a VQ when a SET_VRING_CALL is > >>>>>>>> received. > >>>>>>> > >>>>>>> I don't like that much, as soon as the kick fd is received, it should > >>>>>>> start polling it imho. callfd is optional, it may have one and not > >>>>>>> the other. > >>>>>> > >>>>>> So the question is whether we should be receiving a SET_VRING_CALL > >>>>>> anyway or not, regardless of an fd being sent. (I think we do, but I > >>>>>> haven't done extensive testing with other device types.) > >>>>> > >>>>> I would say not, only KICK is mandatory and that is also not enough > >>>>> to process ring. You must wait for it to be readable. > >>>> > >>>> The problem is that Qemu takes time between sending the kickfd and the > >>>> callfd. Hence the race. Consider this scenario: > >>>> > >>>> 1) Guest configures the device > >>>> 2) Guest put a request on a virtq > >>>> 3) Guest kicks > >>>> 4) Qemu starts configuring the backend > >>>> 4.a) Qemu sends the masked callfds > >>>> 4.b) Qemu sends the virtq sizes and addresses > >>>> 4.c) Qemu sends the kickfds > >>>> > >>>> (When using MQ, Qemu will only send the callfd once all VQs are > >>>> configured) > >>>> > >>>> 5) The backend starts listening on the kickfd upon receiving it > >>>> 6) The backend picks up the guest's request > >>>> 7) The backend processes the request > >>>> 8) The backend puts the response on the used ring > >>>> 9) The backend notifies the masked callfd > >>>> > >>>> 4.d) Qemu sends the callfds > >>>> > >>>> At which point the guest missed the notification and gets stuck. > >>>> > >>>> Perhaps you prefer my initial proposal of sending a spurious > >>>> notification when the backend sees a callfd? > >>>> > >>>> Felipe > >>> > >>> I thought we read the masked callfd when we unmask it, > >>> and forward the interrupt. See kvm_irqfd_assign: > >>> > >>> /* > >>> * Check if there was an event already pending on the eventfd > >>> * before we registered, and trigger it as if we didn't miss it. > >>> */ > >>> events = f.file->f_op->poll(f.file, &irqfd->pt); > >>> > >>> if (events & POLLIN) > >>> schedule_work(&irqfd->inject); > >>> > >>> > >>> > >>> Is this a problem you observe in practice? > >> > >> Thanks for pointing out to this code; I wasn't aware of it. > >> > >> Indeed I'm encountering it in practice. And I've checked that my kernel > >> has the code above. > >> > >> Starts to sound like a race: > >> Qemu registers the new notifier with kvm > >> Backend kicks the (now no longer registered) maskfd > > > > vhost user is not supposed to use maskfd at all. > > > > We have this code: > > if (net->nc->info->type == NET_CLIENT_DRIVER_VHOST_USER) { > > dev->use_guest_notifier_mask = false; > > } > > > > isn't it effective? > > I'm observing this problem when using vhost-user-scsi, not -net. So the code > above is not in effect. Anyway, I'd expect the race I described to also > happen on vhost-scsi. > > The problem is aggravated on storage for the following reason: > SeaBIOS configures the vhost-(user)-scsi device and finds the boot drive and > reads the boot data. > Then the guest kernel boots, the virtio-scsi driver loads and reconfigures > the device. > Qemu sends the new virtq information to the backend, but as soon as the > device status is OK the guest sends reads to the root disk. > And if the irq is lost the guest will wait for a response forever before > making progress. > > Unlike networking (which must cope with packet drops), the guest hangs > waiting for the device to answer. > > So even if you had this race in networking, the guest would eventually > retransmit which would hide the issue. > > Thoughts? > Felipe
maskfd is just racy for vhost-user ATM. I'm guessing vhost-scsi should just set use_guest_notifier_mask, that will fix it. Alternatively, rework masking to support sync with the backend - but I doubt it's useful. > > > > > > > >> Qemu sends the new callfd to the application > >> > >> It's not hard to repro. How could this situation be avoided? > >> > >> Cheers, > >> Felipe > >> > >> > >>> > >>>> > >>>>> > >>>>>>> > >>>>>>> Perhaps it's best for now to delay the callfd notification with a > >>>>>>> flag until it is received? > >>>>>> > >>>>>> The other idea is to always kick when we receive the callfd. I > >>>>>> remember discussing that alternative with you before libvhost-user > >>>>>> went in. The protocol says both the driver and the backend must handle > >>>>>> spurious kicks. This approach also fixes the bug. > >>>>>> > >>>>>> I'm happy with whatever alternative you want, as long it makes > >>>>>> libvhost-user usable for storage devices. > >>>>>> > >>>>>> Thanks, > >>>>>> Felipe > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Signed-off-by: Felipe Franciosi <fel...@nutanix.com> > >>>>>>>> --- > >>>>>>>> contrib/libvhost-user/libvhost-user.c | 26 +++++++++++++------------- > >>>>>>>> 1 file changed, 13 insertions(+), 13 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/contrib/libvhost-user/libvhost-user.c > >>>>>>>> b/contrib/libvhost-user/libvhost-user.c > >>>>>>>> index af4faad..a46ef90 100644 > >>>>>>>> --- a/contrib/libvhost-user/libvhost-user.c > >>>>>>>> +++ b/contrib/libvhost-user/libvhost-user.c > >>>>>>>> @@ -607,19 +607,6 @@ vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg > >>>>>>>> *vmsg) > >>>>>>>> DPRINT("Got kick_fd: %d for vq: %d\n", vmsg->fds[0], index); > >>>>>>>> } > >>>>>>>> > >>>>>>>> - dev->vq[index].started = true; > >>>>>>>> - if (dev->iface->queue_set_started) { > >>>>>>>> - dev->iface->queue_set_started(dev, index, true); > >>>>>>>> - } > >>>>>>>> - > >>>>>>>> - if (dev->vq[index].kick_fd != -1 && dev->vq[index].handler) { > >>>>>>>> - dev->set_watch(dev, dev->vq[index].kick_fd, VU_WATCH_IN, > >>>>>>>> - vu_kick_cb, (void *)(long)index); > >>>>>>>> - > >>>>>>>> - DPRINT("Waiting for kicks on fd: %d for vq: %d\n", > >>>>>>>> - dev->vq[index].kick_fd, index); > >>>>>>>> - } > >>>>>>>> - > >>>>>>>> return false; > >>>>>>>> } > >>>>>>>> > >>>>>>>> @@ -661,6 +648,19 @@ vu_set_vring_call_exec(VuDev *dev, VhostUserMsg > >>>>>>>> *vmsg) > >>>>>>>> > >>>>>>>> DPRINT("Got call_fd: %d for vq: %d\n", vmsg->fds[0], index); > >>>>>>>> > >>>>>>>> + dev->vq[index].started = true; > >>>>>>>> + if (dev->iface->queue_set_started) { > >>>>>>>> + dev->iface->queue_set_started(dev, index, true); > >>>>>>>> + } > >>>>>>>> + > >>>>>>>> + if (dev->vq[index].kick_fd != -1 && dev->vq[index].handler) { > >>>>>>>> + dev->set_watch(dev, dev->vq[index].kick_fd, VU_WATCH_IN, > >>>>>>>> + vu_kick_cb, (void *)(long)index); > >>>>>>>> + > >>>>>>>> + DPRINT("Waiting for kicks on fd: %d for vq: %d\n", > >>>>>>>> + dev->vq[index].kick_fd, index); > >>>>>>>> + } > >>>>>>>> + > >>>>>>>> return false; > >>>>>>>> } > >>>>>>>> > >>>>>>>> -- > >>>>>>>> 1.9.4 > >>>>>>>> > >>>>>>>> > >>>>>>