Re: [PATCH] vdpa: Avoid reset when stop device

Jason Wang Wed, 30 Mar 2022 01:57:07 -0700

On Sat, Mar 26, 2022 at 3:59 AM Si-Wei Liu <si-wei....@oracle.com> wrote:
>
>
>
> On 3/25/2022 12:18 AM, Michael Qiu wrote:
> >
> >
> > On 2022/3/25 14:32, Si-Wei Liu Wrote:
> >>
> >>
> >> On 3/23/2022 2:20 AM, Jason Wang wrote:
> >>> Adding Eugenio,  and Ling Shan.
> >>>
> >>> On Wed, Mar 23, 2022 at 4:58 PM <08005...@163.com> wrote:
> >>>> From: Michael Qiu <qiud...@archeros.com>
> >>>>
> >>>> Currently, when VM poweroff, it will trigger vdpa
> >>>> device(such as mlx bluefield2 VF) reset twice, this leads
> >>>> to below issue:
> >>>>
> >>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
> >>>>
> >>>> This because in vhost_dev_stop(), qemu tries to stop the device,
> >>>> then stop the queue: vhost_virtqueue_stop().
> >>>> In vhost_dev_stop(), it resets the device, which clear some flags
> >>>> in low level driver, and the driver finds
> >>>> that the VQ is invalied, this is the root cause.
> >>>>
> >>>> Actually, device reset will be called within func release()
> >>>>
> >>>> To solve the issue, vdpa should set vring unready, and
> >>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
> >>> This is an interesting issue. Do you see a real issue except for the
> >>> above warnings.
> >>>
> >>> The reason we "abuse" reset is that we don't have a stop uAPI for
> >>> vhost. We plan to add a status bit to stop the whole device in the
> >>> virtio spec, but considering it may take a while maybe we can first
> >>> introduce a new uAPI/ioctl for that.
> >> Yep. What was missing here is a vdpa specific uAPI for per-virtqueue
> >> stop/suspend rather than spec level amendment to stop the whole
> >> device (including both vq and config space). For now we can have vDPA
> >> specific means to control the vq, something vDPA hardware vendor must
> >> support for live migration, e.g. datapath switching to shadow vq. I
> >> believe the spec amendment may follow to define a bit for virtio
> >> feature negotiation later on if needed (FWIW virtio-vdpa already does
> >> set_vq_ready(..., 0) to stop the vq).
> >>
> >> However, there's a flaw in this patch, see below.
> >>>
> >>> Note that the stop doesn't just work for virtqueue but others like,
> >>> e.g config space. But considering we don't have config interrupt
> >>> support right now, we're probably fine.
> >>>
> >>> Checking the driver, it looks to me only the IFCVF's set_vq_ready() is
> >>> problematic, Ling Shan, please have a check. And we probably need a
> >>> workaround for vp_vdpa as well.
> >>>
> >>> Anyhow, this seems to be better than reset. So for 7.1:
> >>>
> >>> Acked-by: Jason Wang <jasow...@redhat.com>
> >>>
> >>>> Signed-off-by: Michael Qiu<qiud...@archeros.com>
> >>>> ---
> >>>>   hw/virtio/vhost-vdpa.c | 8 ++++----
> >>>>   1 file changed, 4 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>>> index c5ed7a3..d858b4f 100644
> >>>> --- a/hw/virtio/vhost-vdpa.c
> >>>> +++ b/hw/virtio/vhost-vdpa.c
> >>>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct
> >>>> vhost_dev *dev, int idx)
> >>>>       return idx;
> >>>>   }
> >>>>
> >>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> >>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev,
> >>>> unsigned int ready)
> >>>>   {
> >>>>       int i;
> >>>>       trace_vhost_vdpa_set_vring_ready(dev);
> >>>>       for (i = 0; i < dev->nvqs; ++i) {
> >>>>           struct vhost_vring_state state = {
> >>>>               .index = dev->vq_index + i,
> >>>> -            .num = 1,
> >>>> +            .num = ready,
> >>>>           };
> >>>>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> >>>>       }
> >>>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct
> >>>> vhost_dev *dev, bool started)
> >>>>           if (unlikely(!ok)) {
> >>>>               return -1;
> >>>>           }
> >>>> -        vhost_vdpa_set_vring_ready(dev);
> >>>> +        vhost_vdpa_set_vring_ready(dev, 1);
> >>>>       } else {
> >>>> +        vhost_vdpa_set_vring_ready(dev, 0);
> >>>>           ok = vhost_vdpa_svqs_stop(dev);
> >>>>           if (unlikely(!ok)) {
> >>>>               return -1;
> >>>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct
> >>>> vhost_dev *dev, bool started)
> >>>>           memory_listener_register(&v->listener,
> >>>> &address_space_memory);
> >>>>           return vhost_vdpa_add_status(dev,
> >>>> VIRTIO_CONFIG_S_DRIVER_OK);
> >>>>       } else {
> >>>> -        vhost_vdpa_reset_device(dev);
> >> Unfortunately, the reset can't be be removed from here as this code
> >> path usually involves virtio reset or status change for e.g. invoked
> >> via virtio_net_set_status(... , 0). Ideally we should use the
> >> VhostOps.vhost_reset_device() to reset the vhost-vdpa device where
> >> status change is involved after vhost_dev_stop() is done, but this
> >> distinction is not there yet as of today in all of the virtio devices
> >> except vhost_user_scsi.
> >>
> >> Alternatively we may be able to do something like below, stop the
> >> virtqueue in vhost_vdpa_get_vring_base() in the
> >> vhost_virtqueue_stop() context. Only until the hardware vq is
> >> stopped, svq can stop and unmap then vhost-vdpa would reset the
> >> device status. It kinda works, but not in a perfect way...
> As I indicated above, this is an less ideal way to address the issue you
> came across about, without losing functionality or introducing
> regression to the code. Ideally it'd be best to get fixed in a clean
> way, though that would a little more effort in code refactoring.
> Personally I feel that the error message you saw is somewhat benign and
> don't think it caused any real problem. Did you see trouble if living
> with the bogus error message for the moment?


Should be fine for networking devices at least since we don't care
about duplicated packets (restore last_avail_idx as used_idx).

But it might be problematic for types of devices.

Thanks


>
> >>
> >> --- a/hw/virtio/vhost-vdpa.c
> >> +++ b/hw/virtio/vhost-vdpa.c
> >> @@ -564,14 +564,14 @@ static int vhost_vdpa_get_vq_index(struct
> >> vhost_dev *dev, int idx)
> >>       return idx;
> >>   }
> >>
> >> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> >> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, int
> >> enable)
> >>   {
> >>       int i;
> >>       trace_vhost_vdpa_set_vring_ready(dev);
> >>       for (i = 0; i < dev->nvqs; ++i) {
> >>           struct vhost_vring_state state = {
> >>               .index = dev->vq_index + i,
> >> -            .num = 1,
> >> +            .num = enable,
> >>           };
> >>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> >>       }
> >> @@ -641,7 +641,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev
> >> *dev, bool started)
> >>
> >>       if (started) {
> >>           vhost_vdpa_host_notifiers_init(dev);
> >> -        vhost_vdpa_set_vring_ready(dev);
> >> +        vhost_vdpa_set_vring_ready(dev, 1);
> >>       } else {
> >>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >>       }
> >> @@ -708,6 +708,9 @@ static int vhost_vdpa_get_vring_base(struct
> >> vhost_dev *dev,
> >>   {
> >>       int ret;
> >>
> >> +    /* Deactivate the queue (best effort) */
> >> +    vhost_vdpa_set_vring_ready(dev, 0);
> >> +
> >
> > I don't think it's a good idea to change the state in "get" function,
> >
> > get means "read" not "write".
> Well, if you look at the context of vhost_vdpa_get_vring_base(), the
> only caller is vhost_virtqueue_stop(). Without stopping the hardware
> ahead, it doesn't make sense to effectively get a used_index snapshot
> for resuming/restarting the vq. It might be more obvious and sensible to
> see that were to introduce another Vhost op to suspend the vq right
> before the get_vring_base() call, though I wouldn't bother doing it.
>
> >
> >>       ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
> >>       trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
> >>       return ret;
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index 437347a..2e917d8 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -1832,15 +1832,15 @@ void vhost_dev_stop(struct vhost_dev *hdev,
> >> VirtIODevice *vdev)
> >>       /* should only be called after backend is connected */
> >>       assert(hdev->vhost_ops);
> >>
> >> -    if (hdev->vhost_ops->vhost_dev_start) {
> >> -        hdev->vhost_ops->vhost_dev_start(hdev, false);
> >> -    }
> >>       for (i = 0; i < hdev->nvqs; ++i) {
> >>           vhost_virtqueue_stop(hdev,
> >>                                vdev,
> >>                                hdev->vqs + i,
> >>                                hdev->vq_index + i);
> >>       }
> >> +    if (hdev->vhost_ops->vhost_dev_start) {
> >> +        hdev->vhost_ops->vhost_dev_start(hdev, false);
> >> +    }
> >>
> >
> > This first idea comes to me is just like this, but at last I don't
> > choose this solution.
> >
> > When we start a device, first we start the virtqueue then
> > vhost_ops->vhost_dev_start.
> >
> > So in stop stage, in my opinion, we should just do the opposite, do as
> > the orignal code do. Change the sequential is a dangerous action.
> I don't see any danger yet, would you please elaborate the specific
> problem you see? I think this sequence is as expected:
>
> 1. suspend each individual vq i.e. stop processing upcoming request, and
> possibly complete inflight requests  -> get_vring_base()
> 2. tear down virtio resources from VMM for e.g. unmap guest memory
> mappings, remove host notifiers, and et al
> 3. reset device -> vhost_vdpa_reset_device()
>
> Regards,
> -Siwei
>
> >
> > Thanks,
> > Michael
> >>       if (vhost_dev_has_iommu(hdev)) {
> >>           if (hdev->vhost_ops->vhost_set_iotlb_callback) {
> >>
> >> Regards,
> >> -Siwei
> >>
> >>>>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >>>> VIRTIO_CONFIG_S_DRIVER);
> >>>>           memory_listener_unregister(&v->listener);
> >>>> --
> >>>> 1.8.3.1
> >>>>
> >>>
> >>
> >>
>

Re: [PATCH] vdpa: Avoid reset when stop device

Reply via email to