On 19/05/15 22:01, John Snow wrote: >> Thanks John. Even though you haven't managed to figure out the problem >> the patchset attempts to solve, were you at least able to reproduce the >> image corruption locally? >> >> That said, the patchset is still worth including just for the fact that >> it fixes the flaky CDROM detection here. >> >> >> ATB, >> >> Mark. >> > > Yeah, I reproduced the problem you're describing and spent a chunk of my > time debugging it and trying to figure out a section of the trace that > coincides with "the problem," but was unable to find anything of > particular interest.
Well that's definitely a good start :) I did spend some time enabling tracepoints on the block layer for both good and bad commits, and the only obvious difference I could see was the batching between multiple read/write requests. > I do notice that sometimes we appear to start a new transfer almost > immediately after one completes, but the code in place to sleep that > action until the guest finishes programming the DMA command seems to > catch it and nothing gets maliciously perturbed. > > I still wonder somewhat that with the move to async and the strange > order of how darwin appears to program DMA transfers that we're hitting > some weird race, but I think that how reliably I hit the exact same > problem means that I should think again :) > > I'll keep poking. The only other thing I had in the back of my mind was whether the async code needs some kind of extra write barrier implemented when used in this way. Unfortunately I haven't had a chance to dig into the block layer to figure out how to attempt this yet. ATB, Mark.