Hiya, I was debugging some more of those IDE fuzzer reports and found a DMA cancellation issue I'm not sure I understand. [1]

TLDR, it's possible to make dma_blk_cb loop on itself forever with the dbs->iov.size == 0 condition. It will just keep re-scheduling dma_blk_cb over and over.

In this particular qtest reproducer, we wind up asking to map 64K at address 0xffffffff to write for the i386 machine. Somehow we manage to map 1 byte, and then 0x1000 more bytes (!?), but then we can go no further.

So, seemingly, the map command can fail in a way that will never resolve; and the dma_blk helpers mediate the callback and don't make it back to device-level code, so ide_cancel_dma_sync actually can't guarantee it cancels anything.

You can change the condition to a loop, but the DMA will reschedule itself forever, and this hangs.

What is the "reschedule" functionality here supposed to be doing? I assume we are waiting to see if a mapping succeeds later, but this mapping seems like it should never work -- how can we determine the difference between a remap that *might* work later and one that will never work?

How many times should we try to map a certain range? address_space_map warns that scheduling with cpu_register_map_client is only *likely* to allow you to succeed.


FWIW -- this bug does show up in the wild. Over the years, people have tried to report it on the launchpad, but I have never been able to reproduce it. Presumably what people are seeing are cases in which they are trying to cancel DMA, but the DMA in-progress has a mapping that fails (either temporarily or permanently) and we fail to cancel the DMA, and QEMU aborts.


[1] Long debugging comment with gorier details: https://bugs.launchpad.net/qemu/+bug/1681439/comments/14


Reply via email to