On Sat, Mar 26, 2022 at 3:59 AM Si-Wei Liu <si-wei....@oracle.com> wrote: > > > > On 3/25/2022 12:18 AM, Michael Qiu wrote: > > > > > > On 2022/3/25 14:32, Si-Wei Liu Wrote: > >> > >> > >> On 3/23/2022 2:20 AM, Jason Wang wrote: > >>> Adding Eugenio, and Ling Shan. > >>> > >>> On Wed, Mar 23, 2022 at 4:58 PM <08005...@163.com> wrote: > >>>> From: Michael Qiu <qiud...@archeros.com> > >>>> > >>>> Currently, when VM poweroff, it will trigger vdpa > >>>> device(such as mlx bluefield2 VF) reset twice, this leads > >>>> to below issue: > >>>> > >>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22) > >>>> > >>>> This because in vhost_dev_stop(), qemu tries to stop the device, > >>>> then stop the queue: vhost_virtqueue_stop(). > >>>> In vhost_dev_stop(), it resets the device, which clear some flags > >>>> in low level driver, and the driver finds > >>>> that the VQ is invalied, this is the root cause. > >>>> > >>>> Actually, device reset will be called within func release() > >>>> > >>>> To solve the issue, vdpa should set vring unready, and > >>>> remove reset ops in device stop: vhost_dev_start(hdev, false). > >>> This is an interesting issue. Do you see a real issue except for the > >>> above warnings. > >>> > >>> The reason we "abuse" reset is that we don't have a stop uAPI for > >>> vhost. We plan to add a status bit to stop the whole device in the > >>> virtio spec, but considering it may take a while maybe we can first > >>> introduce a new uAPI/ioctl for that. > >> Yep. What was missing here is a vdpa specific uAPI for per-virtqueue > >> stop/suspend rather than spec level amendment to stop the whole > >> device (including both vq and config space). For now we can have vDPA > >> specific means to control the vq, something vDPA hardware vendor must > >> support for live migration, e.g. datapath switching to shadow vq. I > >> believe the spec amendment may follow to define a bit for virtio > >> feature negotiation later on if needed (FWIW virtio-vdpa already does > >> set_vq_ready(..., 0) to stop the vq). > >> > >> However, there's a flaw in this patch, see below. > >>> > >>> Note that the stop doesn't just work for virtqueue but others like, > >>> e.g config space. But considering we don't have config interrupt > >>> support right now, we're probably fine. > >>> > >>> Checking the driver, it looks to me only the IFCVF's set_vq_ready() is > >>> problematic, Ling Shan, please have a check. And we probably need a > >>> workaround for vp_vdpa as well. > >>> > >>> Anyhow, this seems to be better than reset. So for 7.1: > >>> > >>> Acked-by: Jason Wang <jasow...@redhat.com> > >>> > >>>> Signed-off-by: Michael Qiu<qiud...@archeros.com> > >>>> --- > >>>> hw/virtio/vhost-vdpa.c | 8 ++++---- > >>>> 1 file changed, 4 insertions(+), 4 deletions(-) > >>>> > >>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > >>>> index c5ed7a3..d858b4f 100644 > >>>> --- a/hw/virtio/vhost-vdpa.c > >>>> +++ b/hw/virtio/vhost-vdpa.c > >>>> @@ -719,14 +719,14 @@ static int vhost_vdpa_get_vq_index(struct > >>>> vhost_dev *dev, int idx) > >>>> return idx; > >>>> } > >>>> > >>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev) > >>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, > >>>> unsigned int ready) > >>>> { > >>>> int i; > >>>> trace_vhost_vdpa_set_vring_ready(dev); > >>>> for (i = 0; i < dev->nvqs; ++i) { > >>>> struct vhost_vring_state state = { > >>>> .index = dev->vq_index + i, > >>>> - .num = 1, > >>>> + .num = ready, > >>>> }; > >>>> vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state); > >>>> } > >>>> @@ -1088,8 +1088,9 @@ static int vhost_vdpa_dev_start(struct > >>>> vhost_dev *dev, bool started) > >>>> if (unlikely(!ok)) { > >>>> return -1; > >>>> } > >>>> - vhost_vdpa_set_vring_ready(dev); > >>>> + vhost_vdpa_set_vring_ready(dev, 1); > >>>> } else { > >>>> + vhost_vdpa_set_vring_ready(dev, 0); > >>>> ok = vhost_vdpa_svqs_stop(dev); > >>>> if (unlikely(!ok)) { > >>>> return -1; > >>>> @@ -1105,7 +1106,6 @@ static int vhost_vdpa_dev_start(struct > >>>> vhost_dev *dev, bool started) > >>>> memory_listener_register(&v->listener, > >>>> &address_space_memory); > >>>> return vhost_vdpa_add_status(dev, > >>>> VIRTIO_CONFIG_S_DRIVER_OK); > >>>> } else { > >>>> - vhost_vdpa_reset_device(dev); > >> Unfortunately, the reset can't be be removed from here as this code > >> path usually involves virtio reset or status change for e.g. invoked > >> via virtio_net_set_status(... , 0). Ideally we should use the > >> VhostOps.vhost_reset_device() to reset the vhost-vdpa device where > >> status change is involved after vhost_dev_stop() is done, but this > >> distinction is not there yet as of today in all of the virtio devices > >> except vhost_user_scsi. > >> > >> Alternatively we may be able to do something like below, stop the > >> virtqueue in vhost_vdpa_get_vring_base() in the > >> vhost_virtqueue_stop() context. Only until the hardware vq is > >> stopped, svq can stop and unmap then vhost-vdpa would reset the > >> device status. It kinda works, but not in a perfect way... > As I indicated above, this is an less ideal way to address the issue you > came across about, without losing functionality or introducing > regression to the code. Ideally it'd be best to get fixed in a clean > way, though that would a little more effort in code refactoring. > Personally I feel that the error message you saw is somewhat benign and > don't think it caused any real problem. Did you see trouble if living > with the bogus error message for the moment?
Should be fine for networking devices at least since we don't care about duplicated packets (restore last_avail_idx as used_idx). But it might be problematic for types of devices. Thanks > > >> > >> --- a/hw/virtio/vhost-vdpa.c > >> +++ b/hw/virtio/vhost-vdpa.c > >> @@ -564,14 +564,14 @@ static int vhost_vdpa_get_vq_index(struct > >> vhost_dev *dev, int idx) > >> return idx; > >> } > >> > >> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev) > >> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, int > >> enable) > >> { > >> int i; > >> trace_vhost_vdpa_set_vring_ready(dev); > >> for (i = 0; i < dev->nvqs; ++i) { > >> struct vhost_vring_state state = { > >> .index = dev->vq_index + i, > >> - .num = 1, > >> + .num = enable, > >> }; > >> vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state); > >> } > >> @@ -641,7 +641,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev > >> *dev, bool started) > >> > >> if (started) { > >> vhost_vdpa_host_notifiers_init(dev); > >> - vhost_vdpa_set_vring_ready(dev); > >> + vhost_vdpa_set_vring_ready(dev, 1); > >> } else { > >> vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs); > >> } > >> @@ -708,6 +708,9 @@ static int vhost_vdpa_get_vring_base(struct > >> vhost_dev *dev, > >> { > >> int ret; > >> > >> + /* Deactivate the queue (best effort) */ > >> + vhost_vdpa_set_vring_ready(dev, 0); > >> + > > > > I don't think it's a good idea to change the state in "get" function, > > > > get means "read" not "write". > Well, if you look at the context of vhost_vdpa_get_vring_base(), the > only caller is vhost_virtqueue_stop(). Without stopping the hardware > ahead, it doesn't make sense to effectively get a used_index snapshot > for resuming/restarting the vq. It might be more obvious and sensible to > see that were to introduce another Vhost op to suspend the vq right > before the get_vring_base() call, though I wouldn't bother doing it. > > > > >> ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring); > >> trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num); > >> return ret; > >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > >> index 437347a..2e917d8 100644 > >> --- a/hw/virtio/vhost.c > >> +++ b/hw/virtio/vhost.c > >> @@ -1832,15 +1832,15 @@ void vhost_dev_stop(struct vhost_dev *hdev, > >> VirtIODevice *vdev) > >> /* should only be called after backend is connected */ > >> assert(hdev->vhost_ops); > >> > >> - if (hdev->vhost_ops->vhost_dev_start) { > >> - hdev->vhost_ops->vhost_dev_start(hdev, false); > >> - } > >> for (i = 0; i < hdev->nvqs; ++i) { > >> vhost_virtqueue_stop(hdev, > >> vdev, > >> hdev->vqs + i, > >> hdev->vq_index + i); > >> } > >> + if (hdev->vhost_ops->vhost_dev_start) { > >> + hdev->vhost_ops->vhost_dev_start(hdev, false); > >> + } > >> > > > > This first idea comes to me is just like this, but at last I don't > > choose this solution. > > > > When we start a device, first we start the virtqueue then > > vhost_ops->vhost_dev_start. > > > > So in stop stage, in my opinion, we should just do the opposite, do as > > the orignal code do. Change the sequential is a dangerous action. > I don't see any danger yet, would you please elaborate the specific > problem you see? I think this sequence is as expected: > > 1. suspend each individual vq i.e. stop processing upcoming request, and > possibly complete inflight requests -> get_vring_base() > 2. tear down virtio resources from VMM for e.g. unmap guest memory > mappings, remove host notifiers, and et al > 3. reset device -> vhost_vdpa_reset_device() > > Regards, > -Siwei > > > > > Thanks, > > Michael > >> if (vhost_dev_has_iommu(hdev)) { > >> if (hdev->vhost_ops->vhost_set_iotlb_callback) { > >> > >> Regards, > >> -Siwei > >> > >>>> vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | > >>>> VIRTIO_CONFIG_S_DRIVER); > >>>> memory_listener_unregister(&v->listener); > >>>> -- > >>>> 1.8.3.1 > >>>> > >>> > >> > >> >