On Thu, Nov 28, 2019 at 05:48:00PM +0100, Halil Pasic wrote: > On Tue, 19 Nov 2019 18:50:03 -0600 > Michael Roth <mdr...@linux.vnet.ibm.com> wrote: > > [..] > > I.e. the calling code is only scheduling a one-shot BH for > > virtio_blk_data_plane_stop_bh, but somehow we end up trying to process > > an additional virtqueue entry before we get there. This is likely due > > to the following check in virtio_queue_host_notifier_aio_poll: > > > > static bool virtio_queue_host_notifier_aio_poll(void *opaque) > > { > > EventNotifier *n = opaque; > > VirtQueue *vq = container_of(n, VirtQueue, host_notifier); > > bool progress; > > > > if (!vq->vring.desc || virtio_queue_empty(vq)) { > > return false; > > } > > > > progress = virtio_queue_notify_aio_vq(vq); > > > > namely the call to virtio_queue_empty(). In this case, since no new > > requests have actually been issued, shadow_avail_idx == last_avail_idx, > > so we actually try to access the vring via vring_avail_idx() to get > > the latest non-shadowed idx: > > > > int virtio_queue_empty(VirtQueue *vq) > > { > > bool empty; > > ... > > > > if (vq->shadow_avail_idx != vq->last_avail_idx) { > > return 0; > > } > > > > rcu_read_lock(); > > empty = vring_avail_idx(vq) == vq->last_avail_idx; > > rcu_read_unlock(); > > return empty; > > > > but since the IOMMU region has been disabled we get a bogus value (0 > > usually), which causes virtio_queue_empty() to falsely report that > > there are entries to be processed, which causes errors such as: > > > > "virtio: zero sized buffers are not allowed" > > > > or > > > > "virtio-blk missing headers" > > > > and puts the device in an error state. > > > > I've seen something similar on s390x with virtio-ccw-blk under > protected virtualization, that made me wonder about how virtio-blk in > particular but also virtio in general handles shutdown and reset. > > This makes me wonder if bus-mastering disabled is the only scenario when > a something like vdev->disabled should be used. > > In particular I have the following mechanism in mind > > qemu_system_reset() --> ... --> qemu_devices_reset() --> ... --> > --> virtio_[transport]_reset() --> ... --> virtio_bus_stop_ioeventfd() > --> virtio_blk_data_plane_stop() > > which in turn triggesrs the following cascade: > virtio_blk_data_plane_stop_bh --> > virtio_queue_aio_set_host_notifier_handler() --> > --> virtio_queue_host_notifier_aio_read() > which however calls > virtio_queue_notify_aio_vq() if the notifier tests as > positive. > > Since we still have vq->handle_aio_output that means we may > call virtqueue_pop() during the reset procedure. > > This was a problem for us, because (due to a bug) the shared pages that > constitute the virtio ring weren't shared any more. And thus we got > the infamous > virtio_error(vdev, "virtio: zero sized buffers are not allowed"). > > Now the bug is no more, and we can tolerate that somewhat late access > to the virtio ring. > > But it keeps nagging me, is it really OK for the device to access the > virtio ring during reset? My intuition tells me that the device should > not look for new requests after it has been told to reset.
Well it's after it was told to reset but it's not after it completed reset. So I think it's fine ... > Opinions? (Michael, Connie) > > Regards, > Halil > > > This patch works around the issue by introducing virtio_set_disabled(), > > which sets a 'disabled' flag to bypass checks like virtio_queue_empty() > > when bus-mastering is disabled. Since we'd check this flag at all the > > same sites as vdev->broken, we replace those checks with an inline > > function which checks for either vdev->broken or vdev->disabled. > > > > The 'disabled' flag is only migrated when set, which should be fairly > > rare, but to maintain migration compatibility we disable it's use for > > older machine types. Users requiring the use of the flag in conjunction > > with older machine types can set it explicitly as a virtio-device > > option. > >