On 7/29/19 5:34 PM, Paolo Bonzini wrote:
> dma_aio_cancel unschedules the BH if there is one, which corresponds
> to the reschedule_dma case of dma_blk_cb. This can stall the DMA
> permanently, because dma_complete will never get invoked and therefore
> nobody will ever invoke the original AIO callback in dbs->common.cb.
>
> Fix this by invoking the callback (which is ensured to happen after
> a bdrv_aio_cancel_async, or done manually in the dbs->bh case), and
> add assertions to check that the DMA state machine is indeed waiting
> for dma_complete or reschedule_dma, but never both.
>
> Reported-by: John Snow <js...@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> ---
> dma-helpers.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/dma-helpers.c b/dma-helpers.c
> index 2d7e02d..d3871dc 100644
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -90,6 +90,7 @@ static void reschedule_dma(void *opaque)
> {
> DMAAIOCB *dbs = (DMAAIOCB *)opaque;
>
> + assert(!dbs->acb && dbs->bh);
> qemu_bh_delete(dbs->bh);
> dbs->bh = NULL;
> dma_blk_cb(dbs, 0);
> @@ -111,15 +112,12 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
> {
> trace_dma_complete(dbs, ret, dbs->common.cb);
>
> + assert(!dbs->acb && !dbs->bh);
> dma_blk_unmap(dbs);
> if (dbs->common.cb) {
> dbs->common.cb(dbs->common.opaque, ret);
> }
> qemu_iovec_destroy(&dbs->iov);
> - if (dbs->bh) {
> - qemu_bh_delete(dbs->bh);
> - dbs->bh = NULL;
> - }
Now presumably handled by dma_aio_cancel,
> qemu_aio_unref(dbs);
> }
>
> @@ -179,14 +177,21 @@ static void dma_aio_cancel(BlockAIOCB *acb)
>
> trace_dma_aio_cancel(dbs);
>
> + assert(!(dbs->acb && dbs->bh));
> if (dbs->acb) {
> + /* This will invoke dma_blk_cb. */
uhh, does it? this is maybe where I got lost reading this code.
Isn't dbs->acb going to be what was returned from e.g.
dma_blk_read_io_func, which ultimately uses blk_aio_em_aiocb_info, that
has no cancel callback?
I thought this was just going to NOP entirely. No?
> blk_aio_cancel_async(dbs->acb);
> + return;
> }
> +
> if (dbs->bh) {
> cpu_unregister_map_client(dbs->bh);
> qemu_bh_delete(dbs->bh);
> dbs->bh = NULL;
> }
> + if (dbs->common.cb) {
> + dbs->common.cb(dbs->common.opaque, -ECANCELED);
> + }
Well, here at least I am now on terra-firma that we're going to call the
original callback with ECANCELED, which is a step towards code that
isn't surprising my sensibilities.
> }
>
> static AioContext *dma_get_aio_context(BlockAIOCB *acb)
>