Hiya, I was debugging some more of those IDE fuzzer reports and found a
DMA cancellation issue I'm not sure I understand. [1]
TLDR, it's possible to make dma_blk_cb loop on itself forever with the
dbs->iov.size == 0 condition. It will just keep re-scheduling dma_blk_cb
over and over.
In this particular qtest reproducer, we wind up asking to map 64K at
address 0xffffffff to write for the i386 machine. Somehow we manage to
map 1 byte, and then 0x1000 more bytes (!?), but then we can go no further.
So, seemingly, the map command can fail in a way that will never
resolve; and the dma_blk helpers mediate the callback and don't make it
back to device-level code, so ide_cancel_dma_sync actually can't
guarantee it cancels anything.
You can change the condition to a loop, but the DMA will reschedule
itself forever, and this hangs.
What is the "reschedule" functionality here supposed to be doing? I
assume we are waiting to see if a mapping succeeds later, but this
mapping seems like it should never work -- how can we determine the
difference between a remap that *might* work later and one that will
never work?
How many times should we try to map a certain range? address_space_map
warns that scheduling with cpu_register_map_client is only *likely* to
allow you to succeed.
FWIW -- this bug does show up in the wild. Over the years, people have
tried to report it on the launchpad, but I have never been able to
reproduce it. Presumably what people are seeing are cases in which they
are trying to cancel DMA, but the DMA in-progress has a mapping that
fails (either temporarily or permanently) and we fail to cancel the DMA,
and QEMU aborts.
[1] Long debugging comment with gorier details:
https://bugs.launchpad.net/qemu/+bug/1681439/comments/14
- dma_blk helpers and infinite dma_memory_map retries John Snow
-