> -----Original Message----- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Wednesday, March 7, 2018 11:09 AM > To: Wodkowski, PawelX <pawelx.wodkow...@intel.com>; Kulasek, TomaszX > <tomaszx.kula...@intel.com>; y...@fridaylinux.org > Cc: Verkamp, Daniel <daniel.verk...@intel.com>; Harris, James R > <james.r.har...@intel.com>; dev@dpdk.org; Stojaczyk, DariuszX > <dariuszx.stojac...@intel.com> > Subject: Re: [dpdk-dev] [PATCH] vhost: stop device before updating public > vring > data > > > > On 03/07/2018 10:16 AM, Wodkowski, PawelX wrote: > >> -----Original Message----- > >> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > >> Sent: Tuesday, March 6, 2018 5:27 PM > >> To: Kulasek, TomaszX <tomaszx.kula...@intel.com>; y...@fridaylinux.org > >> Cc: Verkamp, Daniel <daniel.verk...@intel.com>; Harris, James R > >> <james.r.har...@intel.com>; Wodkowski, PawelX > >> <pawelx.wodkow...@intel.com>; dev@dpdk.org; Stojaczyk, DariuszX > >> <dariuszx.stojac...@intel.com> > >> Subject: Re: [dpdk-dev] [PATCH] vhost: stop device before updating public > vring > >> data > >> > >> Hi Tomasz, > >> > >> On 03/05/2018 05:11 PM, Tomasz Kulasek wrote: > >>> For now DPDK assumes that callfd, kickfd and last_idx are being set just > >>> once during vring initialization and device cannot be running while DPDK > >>> receives SET_VRING_KICK, SET_VRING_CALL and SET_VRING_BASE > messages. > >>> However, that assumption is wrong. For Vhost SCSI messages might arrive > >>> at any point of time, possibly multiple times, one after another. > >>> > >>> QEMU issues SET_VRING_CALL once during device initialization, then again > >>> during device start. The second message will close previous callfd, > >>> which is still being used by the user-implementation of vhost device. > >>> This results in writing to invalid (closed) callfd. > >>> > >>> Other messages like SET_FEATURES, SET_VRING_ADDR etc also will > change > >>> internal state of VQ or device. To prevent race condition device should > >>> also be stopped before updateing vring data. > >>> > >>> Signed-off-by: Dariusz Stojaczyk<dariuszx.stojac...@intel.com> > >>> Signed-off-by: Pawel Wodkowski<pawelx.wodkow...@intel.com> > >>> Signed-off-by: Tomasz Kulasek<tomaszx.kula...@intel.com> > >>> --- > >>> lib/librte_vhost/vhost_user.c | 40 > >> ++++++++++++++++++++++++++++++++++++++++ > >>> 1 file changed, 40 insertions(+) > >> > >> In last release, we have introduced a per-virtqueue lock to protect > >> vring handling against asynchronous device changes. > >> > >> I think that would solve the issue you are facing, but you would need > >> to export the VQs locking functions to the vhost-user lib API to be > >> able to use it. > >> > >> I don't think your current patch is the right solution anyway, because > >> it destroys the device in case we don't want it to remain alive, like > >> set_log_base, or set_features when only the logging feature gets > >> enabled. > > > > Please correct me if I can't see something obvious, but how this lock > > protect > against eg > > SET_MEM_TABLE message? Current flow you are thinking of is: > > > > DPDK vhost-user thread > > 1.1. vhost_user_lock_all_queue_pairs() > > 1.2. vhost_user_set_mem_table() > > 1.3. vhost_user_unlock_all_queue_pairs() > > > > BACKEND: virito-net: > > 2.1. rte_spinlock_lock(&vq->access_lock); > > 2.2. Process vrings and copy all data > > 2.3. rte_spinlock_unlock(&vq->access_lock); > > > > Yes, it will synchronize access to virtio_net structure but what if the > > BACKEND > is in > > zero copy mode and/or pass buffers to physical device? The request will > > not end in 2.2 and you unmap the memory regions in the middle of request. > > Even worse, the physical device will just abort the request but BACKEND can > segfault > > or write random memory because BACKEND try to use invalid memory > address > > (retrieved at request start). > > Right, it doesn't work with zero-copy. > > > To use this per-virtqueue lock: > > 1. the lock need to be held from request start to the end - but this can > > starve > DPDK > > vhost-user thread as there might be many request on-the-fly and when one is > done > > the new one might be started. > > 2. Becouse we don't know if something changed between requst start and > request end > > BACKEND need walk through all descriptors chain at the request end and do > the > > rte_vhost_gpa_to_vva() again. > > > > The SET_MEM_TABLE is most obvious message but the same is true for other > like > > VHOST_IOTLB_INVALIDATE or SET_FEATURES. > > SET_FEATURE should never be sent as soon as the device is started, > except to enable logging. > > For VHOST_IOTLB_INVALIDATE, the solution might be to have a ref counter > per entry, and to only remove it for the cache once it is zero and send > the reply-ack tothe master once this is done. But the cost would be huge > as with large entries, a lot of threads might increment/decrement the > same variable so there will be contention. > > For all other cases, like SET_MEM_TABLE, maybe the solution is to > disable/enable all the queues using the existing ops. > The application or library would have to take care that no guest buffers > are in the wild before returning from the disable. > > Do you think that would work?
What kind of ops can be used to reliably disable all queues and inform backend what changed beside new_device/destroy_device? Those informations are very well hidden inside vhost.c and vhost-user.c files. I think we need new set of ops/callbacks in vhost_device_ops struct that let the backend decide how to behave in response on certain messages like SET_MEM_TABLE. Also I would propose to revert the a3688046995f "vhost: protect active rings from async ring changes" and replace it by behavior of this patch. I see now that this patch need to handle other cases like VHOST_USER_IOTLB_MSG but until we have new set of callbacks I think this is only reliable and fast solution. Pawel > > Cheers, > Maxime > > > Pawel > > > >> > >> Cheers, > >> Maxime