Commit 7e5cdb345f77d76cb4877fe6230c4e17a7d0d0ca ("ide: Increment BB in-flight counter for TRIM BH") fixed a problem when cancelling trims: ide_cancel_dma_sync() drains the BB to stop a potentially ongoing request, and then asserts that no request is active anymore. When trying to cancel a trim, this would only drain a single blk_aio_pdiscard() operation, but not necessarily the trim as a whole. Said commit fixed that by counting the whole trim as an operation, incrementing the in-flight counter for it, so the blk_drain() in ide_cancel_dma_sync() would wait for it.
Fiona found a problem with this, specifically that draining a BB while an IDE trim operation is going on can produce a deadlock: An ongoing blk_aio_pdiscard() will be settled, but any further discard in the same trim will go into the BB's queued_request list and wait there until the drain stops. Meanwhile, the whole trim keeps the BB's in_flight_counter incremented, so the BB will never be considered drained, and qemu hangs. Therefore, we cannot keep the in-flight counter incremented throughout the trim operation. We should still increment it before the completion BH (ide_trim_bh_cb()) is scheduled and decrement it there, so that the blk_drain() in ide_cancel_dma_sync() can await completion of this scheduled BH. With that change, however, it can take multiple iterations of blk_drain() to really settle the whole trim operation, and so ide_cancel_dma_sync() must run it in a loop. Reported-by: Fiona Ebner <f.eb...@proxmox.com> Fixes: 7e5cdb345f77d76cb4877fe6230c4e17a7d0d0ca ("ide: Increment BB in-flight counter for TRIM BH") Signed-off-by: Hanna Czenczek <hre...@redhat.com> --- Tested with a small test boot sector that issues a trim operation with any number of discards from 0 to 64. Before this patch, doing so hangs starting from 2 discards; and before 7e5cdb345f, doing so crashes qemu with any number of discards. With this patch, it always works. --- hw/ide/core.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 2d034731cf..54c51084d2 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -444,7 +444,7 @@ static void ide_trim_bh_cb(void *opaque) iocb->bh = NULL; qemu_aio_unref(iocb); - /* Paired with an increment in ide_issue_trim() */ + /* Paired with an increment in ide_issue_trim_cb() */ blk_dec_in_flight(blk); } @@ -504,6 +504,14 @@ static void ide_issue_trim_cb(void *opaque, int ret) done: iocb->aiocb = NULL; if (iocb->bh) { + /* + * Paired with a decrement in ide_trim_bh_cb(): Ensure we have + * an in-flight count while this TrimAIOCB object lives. + * There is no ongoing blk_aio_pdiscard() operation anymore, + * so here we increment the counter manually before returning. + */ + blk_inc_in_flight(s->blk); + replay_bh_schedule_event(iocb->bh); } } @@ -515,9 +523,6 @@ BlockAIOCB *ide_issue_trim( IDEState *s = opaque; TrimAIOCB *iocb; - /* Paired with a decrement in ide_trim_bh_cb() */ - blk_inc_in_flight(s->blk); - iocb = blk_aio_get(&trim_aiocb_info, s->blk, cb, cb_opaque); iocb->s = s; iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb); @@ -737,11 +742,17 @@ void ide_cancel_dma_sync(IDEState *s) * In the future we'll be able to safely cancel the I/O if the * whole DMA operation will be submitted to disk with a single * aio operation with preadv/pwritev. + * + * Note that TRIM operations call blk_aio_pdiscard() multiple + * times (and finally increment s->blk's in-flight counter while + * ide_trim_bh_cb() is scheduled), so we have to loop blk_drain() + * until the whole operation is done. */ if (s->bus->dma->aiocb) { trace_ide_cancel_dma_sync_remaining(); - blk_drain(s->blk); - assert(s->bus->dma->aiocb == NULL); + while (s->bus->dma->aiocb) { + blk_drain(s->blk); + } } } -- 2.39.1