Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coque...@redhat.com>
> Sent: Thursday, April 15, 2021 3:09 PM
> To: Hu, Jiayu <jiayu...@intel.com>; dev@dpdk.org
> Cc: Xia, Chenbo <chenbo....@intel.com>; Wang, Yinan
> <yinan.w...@intel.com>; Pai G, Sunil <sunil.pa...@intel.com>; Jiang, Cheng1
> <cheng1.ji...@intel.com>
> Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register
> 
> 
> 
> On 4/15/21 3:08 AM, Hu, Jiayu wrote:
> >
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coque...@redhat.com>
> >> Sent: Wednesday, April 14, 2021 6:09 PM
> >> To: Hu, Jiayu <jiayu...@intel.com>; dev@dpdk.org
> >> Cc: Xia, Chenbo <chenbo....@intel.com>; Wang, Yinan
> >> <yinan.w...@intel.com>; Pai G, Sunil <sunil.pa...@intel.com>; Jiang,
> Cheng1
> >> <cheng1.ji...@intel.com>
> >> Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register
> >>
> >>
> >>
> >> On 4/14/21 3:40 AM, Hu, Jiayu wrote:
> >>> Hi Maxime,
> >>>
> >>>> -----Original Message-----
> >>>> From: Maxime Coquelin <maxime.coque...@redhat.com>
> >>>> Sent: Tuesday, April 13, 2021 7:33 PM
> >>>> To: Hu, Jiayu <jiayu...@intel.com>; dev@dpdk.org
> >>>> Cc: Xia, Chenbo <chenbo....@intel.com>; Wang, Yinan
> >>>> <yinan.w...@intel.com>; Pai G, Sunil <sunil.pa...@intel.com>; Jiang,
> >> Cheng1
> >>>> <cheng1.ji...@intel.com>
> >>>> Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register
> >>>>
> >>>>
> >>>>
> >>>> On 4/2/21 3:04 PM, Jiayu Hu wrote:
> >>>>> Users can register async copy device in vring_state_changed(),
> >>>>> when vhost queue is enabled. However, a deadlock occurs inside
> >>>>> rte_vhost_async_channel_register(), if
> >>>> VHOST_USER_F_PROTOCOL_FEATURES
> >>>>> is not supported, as vhost_user_msg_handler() takes vq->access_lock
> >>>>> before calling vhost_user_set_vring_kick().
> >>>>>
> >>>>> This patch avoids async register deadlock by removing calling
> >>>>> vring_state_changed() in vhost_user_set_vring_kick(). It's safe
> >>>>> as vhost_user_msg_handler() will call vring_state_changed() anyway.
> >>>>>
> >>>>> Signed-off-by: Jiayu Hu <jiayu...@intel.com>
> >>>>> ---
> >>>>>  lib/librte_vhost/vhost_user.c | 3 ---
> >>>>>  1 file changed, 3 deletions(-)
> >>>>>
> >>>>> diff --git a/lib/librte_vhost/vhost_user.c
> b/lib/librte_vhost/vhost_user.c
> >>>>> index 44c0452..8f0eba6 100644
> >>>>> --- a/lib/librte_vhost/vhost_user.c
> >>>>> +++ b/lib/librte_vhost/vhost_user.c
> >>>>> @@ -1918,9 +1918,6 @@ vhost_user_set_vring_kick(struct virtio_net
> >>>> **pdev, struct VhostUserMsg *msg,
> >>>>>          */
> >>>>>         if (!(dev->features & (1ULL <<
> >>>> VHOST_USER_F_PROTOCOL_FEATURES))) {
> >>>>>                 vq->enabled = true;
> >>>>> -               if (dev->notify_ops->vring_state_changed)
> >>>>> -                       dev->notify_ops->vring_state_changed(
> >>>>> -                               dev->vid, file.index, 1);
> >>>>>         }
> >>>>>
> >>>>>         if (vq->ready) {
> >>>>>
> >>>>
> >>>> As replied earlier on v1, I agree the call to vring_state_changed here
> >>>> is not needed. But it might not be enough, there are other cases where
> >>>> you could have issues.
> >>>
> >>> vhost_user_notify_queue_state() can be called in three cases:
> >>> 1. when vq ready status changes, vhost_user_msg_handler() calls it to
> >> notify
> >>> backend. But vhost_user_msg_handler() doesn't take lock before calling
> it.
> >>> So in this case, no deadlock occurs in async register.
> >>>
> >>> 2. if vq->ready is true, vhost_user_set_vring_call() calls it to notify
> backend
> >>> vq is not enabled. Although vhost_user_set_vring_call() is protected by
> lock,
> >>> async register is called only if vq is enabled, so async register will 
> >>> not be
> >> called
> >>> in this case.
> >>>
> >>> 3. If vq->ready is true, vhost_user_set_vring_kick() calls it to notify
> backend
> >>> vq is not enabled. Same as #2, async register is called only when vq is
> >> enabled.
> >>> Even if vhost_user_set_vring_kick() is protected by lock, there is no
> >> deadlock in
> >>> async register, as it will not be called in this case.
> >>>
> >>> In summary,  I think there is no deadlock issue in async register if we
> >>> can remove calling vring_state_change() in vhost_user_set_vring_kick().
> >>
> >>
> >> But unregister one could be called in theory no? Otherwise it would look
> >> unbalanced. At least on disabled notification, the app should make sure
> >> the DMA transfers to and from the vring are stopped before it returns
> >> from the callabck. Otherwise it could lead to undefined behavior.
> >
> > Right, users need to call unregister, but we cannot remove calling
> > vhost_user_notify_queue_state() in case #2 and #3, IMHO. So to
> > avoid deadlock, we recommended users to call async unregister in
> > destroy_device(), instead of on vring disabled notification. Does it
> > make sense to you?
> 
> Calling async unregister in destroy device is fine by me. But I'm more
> concerned about DMA transations not being stopped when the ring becomes
> disabled.
If ring becomes disabled, no more new pkts go into async data path, as
virtio_dev_rx_async_submit() returns if vq->enabled is false.

> 
> I cannot say if you are doing it right, because the vhost example does
> not implement the vring_state_changed callback.
> It is not a problem with the sync datapath as we have the lock
> protection + enabled variable that prevents to process the rings when it
> gets stopped.
> But for the async path, if you have programmed DMA transfers, you need
> to rely on the vring_state_change to block the control path while the
> transfers are cancelled or done.

I am not sure if I understand your concern correctly, but for async data path,
enable variable can prevent it from enqueue new pkts when ring is disabled, as
virtio_dev_rx_async_submit() check enable variable before processing ring;
in addition, lock protection can stop async data path as 
virtio_dev_rx_async_submit()
acquires lock too. For example, in the case that front-end updates kickfd,
set_vring_kick() will notify backend that vring is stopped by calling
vhost_user_notify_queue_state(). In this case, sync data path will stop as a 
result of
lock protection, and I think it's the same for async data path, as it acquires 
lock before
processing ring.

Thanks,
Jiayu

> 
> >>
> >>>>
> >>>> Please add stable and Fixes tag.
> >>>
> >>> Do you suggest to make the patch as a fix for 8639d54563a
> >>> ("vhost: introduce async enqueue registration API")? But the
> >>> thing is that code removed in this patch is not introduced
> >>> by this commit.
> >>
> >> The commit you need to point to is the one introducing the
> >> .vring_state_changed() call.
> >
> > So this patch is still a fix for deadlock on async register? Or it is
> > a fix for unnecessary .vring_state_changed() call?
> 
> You made the point that the vring_state_changed() call was not necessary
> in any case. So it can fix the commit introducing it.
> 
> > Thanks,
> > Jiayu
> >
> >>
> >>> Thanks,
> >>> Jiayu
> >>>>
> >>>> Thanks,
> >>>> Maxime
> >>>
> >

Reply via email to