On Tue, 19 Nov 2019 18:50:03 -0600 Michael Roth <mdr...@linux.vnet.ibm.com> wrote:
[..] > I.e. the calling code is only scheduling a one-shot BH for > virtio_blk_data_plane_stop_bh, but somehow we end up trying to process > an additional virtqueue entry before we get there. This is likely due > to the following check in virtio_queue_host_notifier_aio_poll: > > static bool virtio_queue_host_notifier_aio_poll(void *opaque) > { > EventNotifier *n = opaque; > VirtQueue *vq = container_of(n, VirtQueue, host_notifier); > bool progress; > > if (!vq->vring.desc || virtio_queue_empty(vq)) { > return false; > } > > progress = virtio_queue_notify_aio_vq(vq); > > namely the call to virtio_queue_empty(). In this case, since no new > requests have actually been issued, shadow_avail_idx == last_avail_idx, > so we actually try to access the vring via vring_avail_idx() to get > the latest non-shadowed idx: > > int virtio_queue_empty(VirtQueue *vq) > { > bool empty; > ... > > if (vq->shadow_avail_idx != vq->last_avail_idx) { > return 0; > } > > rcu_read_lock(); > empty = vring_avail_idx(vq) == vq->last_avail_idx; > rcu_read_unlock(); > return empty; > > but since the IOMMU region has been disabled we get a bogus value (0 > usually), which causes virtio_queue_empty() to falsely report that > there are entries to be processed, which causes errors such as: > > "virtio: zero sized buffers are not allowed" > > or > > "virtio-blk missing headers" > > and puts the device in an error state. > I've seen something similar on s390x with virtio-ccw-blk under protected virtualization, that made me wonder about how virtio-blk in particular but also virtio in general handles shutdown and reset. This makes me wonder if bus-mastering disabled is the only scenario when a something like vdev->disabled should be used. In particular I have the following mechanism in mind qemu_system_reset() --> ... --> qemu_devices_reset() --> ... --> --> virtio_[transport]_reset() --> ... --> virtio_bus_stop_ioeventfd() --> virtio_blk_data_plane_stop() which in turn triggesrs the following cascade: virtio_blk_data_plane_stop_bh --> virtio_queue_aio_set_host_notifier_handler() --> --> virtio_queue_host_notifier_aio_read() which however calls virtio_queue_notify_aio_vq() if the notifier tests as positive. Since we still have vq->handle_aio_output that means we may call virtqueue_pop() during the reset procedure. This was a problem for us, because (due to a bug) the shared pages that constitute the virtio ring weren't shared any more. And thus we got the infamous virtio_error(vdev, "virtio: zero sized buffers are not allowed"). Now the bug is no more, and we can tolerate that somewhat late access to the virtio ring. But it keeps nagging me, is it really OK for the device to access the virtio ring during reset? My intuition tells me that the device should not look for new requests after it has been told to reset. Opinions? (Michael, Connie) Regards, Halil > This patch works around the issue by introducing virtio_set_disabled(), > which sets a 'disabled' flag to bypass checks like virtio_queue_empty() > when bus-mastering is disabled. Since we'd check this flag at all the > same sites as vdev->broken, we replace those checks with an inline > function which checks for either vdev->broken or vdev->disabled. > > The 'disabled' flag is only migrated when set, which should be fairly > rare, but to maintain migration compatibility we disable it's use for > older machine types. Users requiring the use of the flag in conjunction > with older machine types can set it explicitly as a virtio-device > option. >