On Fri, 04/07 14:54, Fam Zheng wrote: > > main loop iothread > ----------------------------------------------------------------------- > blockdev_snapshot > aio_context_acquire(bs->ctx) > bdrv_flush(bs) > bdrv_co_flush(bs) > ... > qemu_coroutine_yield(co) > BDRV_POLL_WHILE() > aio_context_release(bs->ctx) > aio_context_acquire(bs->ctx) > ... > aio_co_wake(co) > aio_poll(qemu_aio_context) ... > co_schedule_bh_cb() ... > qemu_coroutine_enter(co) ... > /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */ > continues... */ > aio_context_release(bs->ctx)
After talking to Kevin on IRC, this aio_context_acquire() in iothread could be the one in vq handler: main loop iothread ----------------------------------------------------------------------- blockdev_snapshot aio_context_acquire(bs->ctx) virtio_scsi_data_plane_handle_cmd bdrv_drained_begin(bs->ctx) bdrv_flush(bs) bdrv_co_flush(bs) aio_context_acquire(bs->ctx).enter ... qemu_coroutine_yield(co) BDRV_POLL_WHILE() aio_context_release(bs->ctx) aio_context_acquire(bs->ctx).return ... aio_co_wake(co) aio_poll(qemu_aio_context) ... co_schedule_bh_cb() ... qemu_coroutine_enter(co) ... /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */ continues... */ aio_context_release(bs->ctx) aio_context_acquire(bs->ctx) Note that in this special case, bdrv_drained_begin() doesn't do the "release, poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0. This might be the root cause of the race? (Ed's test case showed that apart from vq handlers, block jobs in the iothread can also trigger the same pattern of race. For this part we need John's patches to pause block jobs in bdrv_drained_begin.) Fam