On 03/01/2017 03:29 PM, Paolo Bonzini wrote: > > > On 01/03/2017 14:22, Halil Pasic wrote: >> Here a trace: >> >> 135871@1488304024.512533:virtio_blk_req_complete req 0x2aa6b117e10 status 0 >> 135871@1488304024.512541:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq >> 0x2aa6b4c0870 >> 135871@1488304024.522607:virtio_blk_req_complete req 0x2aa6b118980 status 0 >> 135871@1488304024.522616:virtio_blk_req_complete req 0x2aa6b119260 status 0 >> 135871@1488304024.522627:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq >> 0x2aa6b4c0870 >> 135871@1488304024.527386:virtio_blk_req_complete req 0x2aa6b118980 status 0 >> 135871@1488304024.527431:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq >> 0x2aa6b4c0870 >> 135871@1488304024.528611:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq >> 0x2aa6b4de880 >> 135871@1488304024.528628:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq >> 0x2aa6b4de8f8 >> 135871@1488304024.528753:virtio_blk_data_plane_stop dataplane 0x2aa6b0e5540 >> ^== DATAPLANE STOP >> 135871@1488304024.530709:virtio_blk_req_complete req 0x2aa6b117e10 status 0 >> 135871@1488304024.530752:virtio_guest_notifier_read vdev 0x2aa6b0e19d8 vq >> 0x2aa6b4c0870 >> ^== comes from k->set_guest_notifiers(qbus->parent, >> nvqs, false); >> in virtio_blk_data_plane_stop and done >> immediately after >> irqfd is cleaned up by the transport >> 135871@1488304024.530836:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq >> 0x2aa6b4c0870 >> halil: error in event_notifier_set: Bad file descriptor >> ^== here we have the problem >> >> If you want a stacktrace that can be arranged to. >> >>> like a reset should cause it (the only call in virtio-blk is from >>> virtio_blk_data_plane_stop), and then the guest doesn't care anymore >>> about interrupts. >> I do not understand this with 'doesn't care anymore about interrupts'. >> I was debugging a virtio-blk device being stuck waiting for a host >> notification (interrupt) after migration. > > Ok, this explains it better then. The issue is that > virtio_blk_data_plane_stop doesn't flush the bottom half, which you want > to do when the caller is, for example, virtio_ccw_vmstate_change. > > Does it work if you call to qemu_bh_cancel(s->bh) and notify_guest_bh(s) > after > > blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context()); > > ? >
With --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -260,6 +260,8 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev) /* Drain and switch bs back to the QEMU main loop */ blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context()); + qemu_bh_cancel(s->bh); + notify_guest_bh(s); applied I do not see the problem any more. I will most likely turn this into a patch tomorrow. I would like to give it some more testing and thinking (see questions below) until tomorrow. I should probably cc stable, or? I would also like to do some diagnostic stuff if virtio_notify_irqfd fails. Maybe assert success for event_notifier_set. Would that be OK with you? I have a couple of questions about the ways of the dataplane code. If you are too busy, feel free to not answer -- I will keep thinking myself. Q1. For this to work correctly, it seems to me, we need to be sure that virtio_blk_req_complete can not be happen between the newly added notify_guest_bh(s); and vblk->dataplane_started = false; becomes visible. How is this ensured? Q2. The virtio_blk_data_plane_stop should be from the thread/context associated with the main event loop, and with that vblk->dataplane_started = false too. But I think dataplane_started may end up being used form a different thread (e.g. req_complete). How does the sequencing work there and/or is it even important? Regards, Halil