> -----Original Message----- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Thursday, December 7, 2017 6:02 PM > To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; y...@fridaylinux.org; Bie, > Tiwei > Cc: sta...@dpdk.org; jfrei...@redhat.com > Subject: Re: [PATCH] vhost_user: protect active rings from async ring > changes > > > > On 12/07/2017 10:33 AM, Tan, Jianfeng wrote: > > > > > >> -----Original Message----- > >> From: Victor Kaplansky [mailto:vkapl...@redhat.com] > >> Sent: Wednesday, December 6, 2017 9:56 PM > >> To: dev@dpdk.org; y...@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng; > >> vkapl...@redhat.com > >> Cc: sta...@dpdk.org; jfrei...@redhat.com; Maxime Coquelin > >> Subject: [PATCH] vhost_user: protect active rings from async ring changes > >> > >> When performing live migration or memory hot-plugging, > >> the changes to the device and vrings made by message handler > >> done independently from vring usage by PMD threads. > >> > >> This causes for example segfauls during live-migration > > > > segfauls ->segfaults? > > > >> with MQ enable, but in general virtually any request > >> sent by qemu changing the state of device can cause > >> problems. > >> > >> These patches fixes all above issues by adding a spinlock > >> to every vring and requiring message handler to start operation > >> only after ensuring that all PMD threads related to the divece > > > > Another typo: divece. > > > >> are out of critical section accessing the vring data. > >> > >> Each vring has its own lock in order to not create contention > >> between PMD threads of different vrings and to prevent > >> performance degradation by scaling queue pair number. > > > > Also wonder how much overhead it brings. > > > > Instead of locking each vring, can we just, waiting a while (10us for > > example) > after call destroy_device() callback so that every PMD thread has enough > time to skip out the criterial area? > > No, because we are not destroying the device when it is needed. > Actually, once destroy_device() is called, it is likely that the > application has taken care the ring aren't being processed anymore > before returning from the callback (This is at least the case with Vhost > PMD).
OK, I did not put it right way as there are multiple cases above: migration and memory hot plug. Let me try again: Whenever a vhost thread handles a message affecting PMD threads, (like SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag - VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out of those criterial area. After message handling, reset the flag - VIRTIO_DEV_RUNNING. I suppose it can work, basing on an assumption that PMD threads work in polling mode and can skip criterial area quickly and inevitably. > > The reason we need the lock is to protect PMD threads from the handling > of some vhost-user protocol requests. > For example SET_MEM_TABLE in the case of memory hotplug, or > SET_LOG_BASE > in case of multiqueue, which is sent for every queue pair and results in > unmapping/remapping the logging area. Yes, I understand how it fails. Thanks, Jianfeng > > Maxime