On Mon, Jul 02, 2018 at 04:18:43PM +0100, Stefan Hajnoczi wrote: > On Fri, Jun 29, 2018 at 03:40:50PM +0300, Denis Plotnikov wrote: > > There are cases when a request to a block driver state shouldn't have > > appeared producing dangerous race conditions. > > This misbehaviour is usually happens with storage devices emulated > > without eventfd for guest to host notifications like IDE. > > > > The issue arises when the context is in the "drained" section > > and doesn't expect the request to come, but request comes from the > > device not using iothread and which context is processed by the main loop. > > > > The main loop apart of the iothread event loop isn't blocked by the > > "drained" section. > > The request coming and processing while in "drained" section can spoil the > > block driver state consistency. > > > > This behavior can be observed in the following KVM-based case: > > > > 1. Setup a VM with an IDE disk. > > 2. Inside a VM start a disk writing load for the IDE device > > e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct > > 3. On the host create a mirroring block job for the IDE device > > e.g: drive_mirror <your_IDE> <your_path> > > 4. On the host finish the block job > > e.g: block_job_complete <your_IDE> > > > > Having done the 4th action, you could get an assert: > > assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run. > > On my setup, the assert is 1/3 reproducible. > > > > The patch series introduces the mechanism to postpone the requests > > until the BDS leaves "drained" section for the devices not using iothreads. > > Also, it modifies the asynchronous block backend infrastructure to use > > that mechanism to release the assert bug for IDE devices. > > I don't understand the scenario. IDE emulation runs in the vcpu and > main loop threads. These threads hold the global mutex when executing > QEMU code. If thread A is in a drained region with the global mutex, > then thread B cannot run QEMU code since it would need to global mutex. > > So I guess the problem is not that thread B will submit new requests, > but maybe that the IDE DMA code will run a completion in thread A and > submit another request in the drained region?
Ping! :) Stefan
signature.asc
Description: PGP signature