On 19/05/15 22:01, John Snow wrote:

>> Thanks John. Even though you haven't managed to figure out the problem
>> the patchset attempts to solve, were you at least able to reproduce the
>> image corruption locally?
>>
>> That said, the patchset is still worth including just for the fact that
>> it fixes the flaky CDROM detection here.
>>
>>
>> ATB,
>>
>> Mark.
>>
> 
> Yeah, I reproduced the problem you're describing and spent a chunk of my
> time debugging it and trying to figure out a section of the trace that
> coincides with "the problem," but was unable to find anything of
> particular interest.

Well that's definitely a good start :)  I did spend some time enabling
tracepoints on the block layer for both good and bad commits, and the
only obvious difference I could see was the batching between multiple
read/write requests.

> I do notice that sometimes we appear to start a new transfer almost
> immediately after one completes, but the code in place to sleep that
> action until the guest finishes programming the DMA command seems to
> catch it and nothing gets maliciously perturbed.
> 
> I still wonder somewhat that with the move to async and the strange
> order of how darwin appears to program DMA transfers that we're hitting
> some weird race, but I think that how reliably I hit the exact same
> problem means that I should think again :)
> 
> I'll keep poking.

The only other thing I had in the back of my mind was whether the async
code needs some kind of extra write barrier implemented when used in
this way. Unfortunately I haven't had a chance to dig into the block
layer to figure out how to attempt this yet.


ATB,

Mark.


Reply via email to