Hi Maxime, > -----Original Message----- > From: Maxime Coquelin <maxime.coque...@redhat.com> > Sent: Thursday, April 15, 2021 3:09 PM > To: Hu, Jiayu <jiayu...@intel.com>; dev@dpdk.org > Cc: Xia, Chenbo <chenbo....@intel.com>; Wang, Yinan > <yinan.w...@intel.com>; Pai G, Sunil <sunil.pa...@intel.com>; Jiang, Cheng1 > <cheng1.ji...@intel.com> > Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register > > > > On 4/15/21 3:08 AM, Hu, Jiayu wrote: > > > > > >> -----Original Message----- > >> From: Maxime Coquelin <maxime.coque...@redhat.com> > >> Sent: Wednesday, April 14, 2021 6:09 PM > >> To: Hu, Jiayu <jiayu...@intel.com>; dev@dpdk.org > >> Cc: Xia, Chenbo <chenbo....@intel.com>; Wang, Yinan > >> <yinan.w...@intel.com>; Pai G, Sunil <sunil.pa...@intel.com>; Jiang, > Cheng1 > >> <cheng1.ji...@intel.com> > >> Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register > >> > >> > >> > >> On 4/14/21 3:40 AM, Hu, Jiayu wrote: > >>> Hi Maxime, > >>> > >>>> -----Original Message----- > >>>> From: Maxime Coquelin <maxime.coque...@redhat.com> > >>>> Sent: Tuesday, April 13, 2021 7:33 PM > >>>> To: Hu, Jiayu <jiayu...@intel.com>; dev@dpdk.org > >>>> Cc: Xia, Chenbo <chenbo....@intel.com>; Wang, Yinan > >>>> <yinan.w...@intel.com>; Pai G, Sunil <sunil.pa...@intel.com>; Jiang, > >> Cheng1 > >>>> <cheng1.ji...@intel.com> > >>>> Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register > >>>> > >>>> > >>>> > >>>> On 4/2/21 3:04 PM, Jiayu Hu wrote: > >>>>> Users can register async copy device in vring_state_changed(), > >>>>> when vhost queue is enabled. However, a deadlock occurs inside > >>>>> rte_vhost_async_channel_register(), if > >>>> VHOST_USER_F_PROTOCOL_FEATURES > >>>>> is not supported, as vhost_user_msg_handler() takes vq->access_lock > >>>>> before calling vhost_user_set_vring_kick(). > >>>>> > >>>>> This patch avoids async register deadlock by removing calling > >>>>> vring_state_changed() in vhost_user_set_vring_kick(). It's safe > >>>>> as vhost_user_msg_handler() will call vring_state_changed() anyway. > >>>>> > >>>>> Signed-off-by: Jiayu Hu <jiayu...@intel.com> > >>>>> --- > >>>>> lib/librte_vhost/vhost_user.c | 3 --- > >>>>> 1 file changed, 3 deletions(-) > >>>>> > >>>>> diff --git a/lib/librte_vhost/vhost_user.c > b/lib/librte_vhost/vhost_user.c > >>>>> index 44c0452..8f0eba6 100644 > >>>>> --- a/lib/librte_vhost/vhost_user.c > >>>>> +++ b/lib/librte_vhost/vhost_user.c > >>>>> @@ -1918,9 +1918,6 @@ vhost_user_set_vring_kick(struct virtio_net > >>>> **pdev, struct VhostUserMsg *msg, > >>>>> */ > >>>>> if (!(dev->features & (1ULL << > >>>> VHOST_USER_F_PROTOCOL_FEATURES))) { > >>>>> vq->enabled = true; > >>>>> - if (dev->notify_ops->vring_state_changed) > >>>>> - dev->notify_ops->vring_state_changed( > >>>>> - dev->vid, file.index, 1); > >>>>> } > >>>>> > >>>>> if (vq->ready) { > >>>>> > >>>> > >>>> As replied earlier on v1, I agree the call to vring_state_changed here > >>>> is not needed. But it might not be enough, there are other cases where > >>>> you could have issues. > >>> > >>> vhost_user_notify_queue_state() can be called in three cases: > >>> 1. when vq ready status changes, vhost_user_msg_handler() calls it to > >> notify > >>> backend. But vhost_user_msg_handler() doesn't take lock before calling > it. > >>> So in this case, no deadlock occurs in async register. > >>> > >>> 2. if vq->ready is true, vhost_user_set_vring_call() calls it to notify > backend > >>> vq is not enabled. Although vhost_user_set_vring_call() is protected by > lock, > >>> async register is called only if vq is enabled, so async register will > >>> not be > >> called > >>> in this case. > >>> > >>> 3. If vq->ready is true, vhost_user_set_vring_kick() calls it to notify > backend > >>> vq is not enabled. Same as #2, async register is called only when vq is > >> enabled. > >>> Even if vhost_user_set_vring_kick() is protected by lock, there is no > >> deadlock in > >>> async register, as it will not be called in this case. > >>> > >>> In summary, I think there is no deadlock issue in async register if we > >>> can remove calling vring_state_change() in vhost_user_set_vring_kick(). > >> > >> > >> But unregister one could be called in theory no? Otherwise it would look > >> unbalanced. At least on disabled notification, the app should make sure > >> the DMA transfers to and from the vring are stopped before it returns > >> from the callabck. Otherwise it could lead to undefined behavior. > > > > Right, users need to call unregister, but we cannot remove calling > > vhost_user_notify_queue_state() in case #2 and #3, IMHO. So to > > avoid deadlock, we recommended users to call async unregister in > > destroy_device(), instead of on vring disabled notification. Does it > > make sense to you? > > Calling async unregister in destroy device is fine by me. But I'm more > concerned about DMA transations not being stopped when the ring becomes > disabled. If ring becomes disabled, no more new pkts go into async data path, as virtio_dev_rx_async_submit() returns if vq->enabled is false.
> > I cannot say if you are doing it right, because the vhost example does > not implement the vring_state_changed callback. > It is not a problem with the sync datapath as we have the lock > protection + enabled variable that prevents to process the rings when it > gets stopped. > But for the async path, if you have programmed DMA transfers, you need > to rely on the vring_state_change to block the control path while the > transfers are cancelled or done. I am not sure if I understand your concern correctly, but for async data path, enable variable can prevent it from enqueue new pkts when ring is disabled, as virtio_dev_rx_async_submit() check enable variable before processing ring; in addition, lock protection can stop async data path as virtio_dev_rx_async_submit() acquires lock too. For example, in the case that front-end updates kickfd, set_vring_kick() will notify backend that vring is stopped by calling vhost_user_notify_queue_state(). In this case, sync data path will stop as a result of lock protection, and I think it's the same for async data path, as it acquires lock before processing ring. Thanks, Jiayu > > >> > >>>> > >>>> Please add stable and Fixes tag. > >>> > >>> Do you suggest to make the patch as a fix for 8639d54563a > >>> ("vhost: introduce async enqueue registration API")? But the > >>> thing is that code removed in this patch is not introduced > >>> by this commit. > >> > >> The commit you need to point to is the one introducing the > >> .vring_state_changed() call. > > > > So this patch is still a fix for deadlock on async register? Or it is > > a fix for unnecessary .vring_state_changed() call? > > You made the point that the vring_state_changed() call was not necessary > in any case. So it can fix the commit introducing it. > > > Thanks, > > Jiayu > > > >> > >>> Thanks, > >>> Jiayu > >>>> > >>>> Thanks, > >>>> Maxime > >>> > >