Hi Maxime, > -----Original Message----- > From: Maxime Coquelin <maxime.coque...@redhat.com> > Sent: Friday, April 16, 2021 4:34 PM > To: Hu, Jiayu <jiayu...@intel.com>; dev@dpdk.org > Cc: Xia, Chenbo <chenbo....@intel.com>; Wang, Yinan > <yinan.w...@intel.com>; Pai G, Sunil <sunil.pa...@intel.com>; Jiang, Cheng1 > <cheng1.ji...@intel.com> > Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register > > > > On 4/16/21 10:18 AM, Hu, Jiayu wrote: > > Hi Maxime, > > > >> -----Original Message----- > >> From: Maxime Coquelin <maxime.coque...@redhat.com> > >> Sent: Friday, April 16, 2021 2:35 PM > >> To: Hu, Jiayu <jiayu...@intel.com>; dev@dpdk.org > >> Cc: Xia, Chenbo <chenbo....@intel.com>; Wang, Yinan > >> <yinan.w...@intel.com>; Pai G, Sunil <sunil.pa...@intel.com>; Jiang, > Cheng1 > >> <cheng1.ji...@intel.com> > >> Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register > >> > >> Hi Jiayu, > >> > >> On 4/16/21 4:19 AM, Hu, Jiayu wrote: > >>> Hi Maxime, > >>>>>> On 4/14/21 3:40 AM, Hu, Jiayu wrote: > >>>>>>> Hi Maxime, > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Maxime Coquelin <maxime.coque...@redhat.com> > >>>>>>>> Sent: Tuesday, April 13, 2021 7:33 PM > >>>>>>>> To: Hu, Jiayu <jiayu...@intel.com>; dev@dpdk.org > >>>>>>>> Cc: Xia, Chenbo <chenbo....@intel.com>; Wang, Yinan > >>>>>>>> <yinan.w...@intel.com>; Pai G, Sunil <sunil.pa...@intel.com>; > Jiang, > >>>>>> Cheng1 > >>>>>>>> <cheng1.ji...@intel.com> > >>>>>>>> Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 4/2/21 3:04 PM, Jiayu Hu wrote: > >>>>>>>>> Users can register async copy device in vring_state_changed(), > >>>>>>>>> when vhost queue is enabled. However, a deadlock occurs inside > >>>>>>>>> rte_vhost_async_channel_register(), if > >>>>>>>> VHOST_USER_F_PROTOCOL_FEATURES > >>>>>>>>> is not supported, as vhost_user_msg_handler() takes vq- > >>> access_lock > >>>>>>>>> before calling vhost_user_set_vring_kick(). > >>>>>>>>> > >>>>>>>>> This patch avoids async register deadlock by removing calling > >>>>>>>>> vring_state_changed() in vhost_user_set_vring_kick(). It's safe > >>>>>>>>> as vhost_user_msg_handler() will call vring_state_changed() > >> anyway. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Jiayu Hu <jiayu...@intel.com> > >>>>>>>>> --- > >>>>>>>>> lib/librte_vhost/vhost_user.c | 3 --- > >>>>>>>>> 1 file changed, 3 deletions(-) > >>>>>>>>> > >>>>>>>>> diff --git a/lib/librte_vhost/vhost_user.c > >>>> b/lib/librte_vhost/vhost_user.c > >>>>>>>>> index 44c0452..8f0eba6 100644 > >>>>>>>>> --- a/lib/librte_vhost/vhost_user.c > >>>>>>>>> +++ b/lib/librte_vhost/vhost_user.c > >>>>>>>>> @@ -1918,9 +1918,6 @@ vhost_user_set_vring_kick(struct > >> virtio_net > >>>>>>>> **pdev, struct VhostUserMsg *msg, > >>>>>>>>> */ > >>>>>>>>> if (!(dev->features & (1ULL << > >>>>>>>> VHOST_USER_F_PROTOCOL_FEATURES))) { > >>>>>>>>> vq->enabled = true; > >>>>>>>>> - if (dev->notify_ops->vring_state_changed) > >>>>>>>>> - dev->notify_ops->vring_state_changed( > >>>>>>>>> - dev->vid, file.index, 1); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> if (vq->ready) { > >>>>>>>>> > >>>>>>>> > >>>>>>>> As replied earlier on v1, I agree the call to vring_state_changed > here > >>>>>>>> is not needed. But it might not be enough, there are other cases > >> where > >>>>>>>> you could have issues. > >>>>>>> > >>>>>>> vhost_user_notify_queue_state() can be called in three cases: > >>>>>>> 1. when vq ready status changes, vhost_user_msg_handler() calls it > to > >>>>>> notify > >>>>>>> backend. But vhost_user_msg_handler() doesn't take lock before > >> calling > >>>> it. > >>>>>>> So in this case, no deadlock occurs in async register. > >>>>>>> > >>>>>>> 2. if vq->ready is true, vhost_user_set_vring_call() calls it to > >>>>>>> notify > >>>> backend > >>>>>>> vq is not enabled. Although vhost_user_set_vring_call() is protected > by > >>>> lock, > >>>>>>> async register is called only if vq is enabled, so async register > >>>>>>> will not > >> be > >>>>>> called > >>>>>>> in this case. > >>>>>>> > >>>>>>> 3. If vq->ready is true, vhost_user_set_vring_kick() calls it to > >>>>>>> notify > >>>> backend > >>>>>>> vq is not enabled. Same as #2, async register is called only when vq > is > >>>>>> enabled. > >>>>>>> Even if vhost_user_set_vring_kick() is protected by lock, there is no > >>>>>> deadlock in > >>>>>>> async register, as it will not be called in this case. > >>>>>>> > >>>>>>> In summary, I think there is no deadlock issue in async register if > >>>>>>> we > >>>>>>> can remove calling vring_state_change() in > >> vhost_user_set_vring_kick(). > >>>>>> > >>>>>> > >>>>>> But unregister one could be called in theory no? Otherwise it would > >> look > >>>>>> unbalanced. At least on disabled notification, the app should make > sure > >>>>>> the DMA transfers to and from the vring are stopped before it > returns > >>>>>> from the callabck. Otherwise it could lead to undefined behavior. > >>>>> > >>>>> Right, users need to call unregister, but we cannot remove calling > >>>>> vhost_user_notify_queue_state() in case #2 and #3, IMHO. So to > >>>>> avoid deadlock, we recommended users to call async unregister in > >>>>> destroy_device(), instead of on vring disabled notification. Does it > >>>>> make sense to you? > >>>> > >>>> Calling async unregister in destroy device is fine by me. But I'm more > >>>> concerned about DMA transations not being stopped when the ring > >> becomes > >>>> disabled. > >>> If ring becomes disabled, no more new pkts go into async data path, as > >>> virtio_dev_rx_async_submit() returns if vq->enabled is false. > >>> > >>>> > >>>> I cannot say if you are doing it right, because the vhost example does > >>>> not implement the vring_state_changed callback. > >>>> It is not a problem with the sync datapath as we have the lock > >>>> protection + enabled variable that prevents to process the rings when it > >>>> gets stopped. > >>>> But for the async path, if you have programmed DMA transfers, you > need > >>>> to rely on the vring_state_change to block the control path while the > >>>> transfers are cancelled or done. > >>> > >>> I am not sure if I understand your concern correctly, but for async data > >> path, > >>> enable variable can prevent it from enqueue new pkts when ring is > >> disabled, as > >>> virtio_dev_rx_async_submit() check enable variable before processing > ring; > >>> in addition, lock protection can stop async data path as > >> virtio_dev_rx_async_submit() > >>> acquires lock too. For example, in the case that front-end updates kickfd, > >>> set_vring_kick() will notify backend that vring is stopped by calling > >>> vhost_user_notify_queue_state(). In this case, sync data path will stop as > a > >> result of > >>> lock protection, and I think it's the same for async data path, as it > acquires > >> lock before > >>> processing ring. > >> > >> > >> What about memory hotplug happening while the DMA transfers are on- > >> going > >> for example? > >> > >> In this case, the lock is enough for sync datapath, but is not for async > >> one. > >> > >> If a VHOST_USER_SET_MEM_TABLE request is received while > >> virtio_dev_rx_async_submit(), it will be blocked until > >> virtio_dev_rx_async_submit() returns but the DMA transfers may not be > >> finished yet. > >> When unblocked the control thread will call vhost_user_set_mem_table(), > >> which will mnumap the current memory regions before mmaping the new > >> ones while DMA transfers are on-going. > >> > >> To fix this, we need to call the vring_state_changed callback for all > >> the virtqueues with disable state. in your app, you need to stop DMA > >> transfers when disabled state notification happens, or block the > >> callback while the transfer is done. > > > > Let me summarize the problem: > > when frontend memory is hot-plug, host application needs to stop > > DMA transfer for all virtqueues, which is done by emptying in-flight > > DMA copies and unregister async inside vring_state_changed(). > > I don't think unregistering async channels is mandatory. > On disable notification, the callback has to block until all on-going > DMA transfers are done. New transfers won't be schedules since > virtio_dev_rx_async_submit() will be blocked by the lock while the > memory regions update is being done.
Yes, unregister is not a must, but the problem is that rte_vhost_poll_enqueue_completed() takes lock too. In VM memory hot-plug case, emptying in-flight pkts in vring_state_changed() may be OK, as it is called by vhost_user_msg_handler() (please correct me if I am wrong), which doesn't take lock. But if it is called inside set_vring_kick()/_call(), a deadlock occurs in rte_vhost_poll_enqueue_completed(), even if user app doesn't call async unregister API. > > > But > > vhost library takes lock before entering vring_state_changed(), and > > async unregister and rte_vhost_poll_enqueue_completed() both > > acquire lock, so deadlock occurs inside rte_vhost_poll_enqueue_ > > completed() or async unregister. (please correct me if am wrong) > > > > There may be two possible solutions: > > 1. remove lock before calling vring_state_change() in vhost library; > > I don't think so. If you release the lock in vring_state_change(), it > will enable the app to enqueue/dequeue descriptors in the middle of > handling the request. Sorry, I didn't express my opinion clearly. I mean remove the following the code in set_vring_kick() and set_vring_call(): - if (vq->ready) { - vq->ready = false; - vhost_user_notify_queue_state(dev, file.index, 0); - } IMHO, in the both cases, the purpose of calling vhost_user_notify_queue_state() is to tell backend vring is not ready to use as a result of callfd/kickfd update; after that, vhost_user_msg_handler() calls it again to notify backend of vring ready to process. But there is lock protection before entering set_vring_kick()/_call(), so data path threads will not process vring during updating callfd/kickfd anyway. After updating callfd/kickfd, I think data path threads are already safe to start to process vring, so there is no need for a ready notification in vhost_user_msg_hander(). BTW, lock protection for vhost_user_notify_queue_state() is only in set_vring_kick()/_call(), but not in vhost_msg_handler(). Thanks, Jiayu > > > 2. provide a new callback without acquiring lock for DMA accelerated > > case. The callback is used to notify backend that you need to stop > > DMA transfer. > > No, in your app just wait for DMA transfers to be finished before > returning from the callback. > > Please implement vring_state_changed callback in Vhost example, it is > now mandatory with async datapath, and it will help to have an example > on what real applications should do. > > Thanks, > Maxime > > > Thanks, > > Jiayu > >