On 01/11/2010 08:46 AM, Avi Kivity wrote:
On 01/11/2010 04:37 PM, Anthony Liguori wrote:
That has the downside of bouncing a cache line on unrelated exits.
The read and write sides of the ring are widely separated in physical
memory specifically to avoid cache line bouncing.
I meant, exits on random vcpus will cause the cacheline containing the
notification disable flag to bounce around. As it is, we read it on
the vcpu that owns the queue and write it on that vcpu or the I/O thread.
Bottom halves are always run from the IO thread.
It probably doesn't matter with qemu as it is now, since it will
bounce qemu_mutex, but it will hurt with large guests (especially if
they have many rings).
IMO we should get things to work well without riding on unrelated
exits, especially as we're trying to reduce those exits.
A block I/O request can potentially be very, very long lived. By
serializing requests like this, there's a high likelihood that it's
going to kill performance with anything capable of processing
multiple requests.
Right, that's why I suggested having a queue depth at which disabling
notification kicks in. The patch hardcodes this depth to 1, unpatched
qemu is infinite, a good value is probably spindle count + VAT.
That means we would need a user visible option which is quite unfortunate.
Also, that logic only really makes sense with cache=off. With
cache=writethrough, you can get pathological cases whereas you have an
uncached access followed by cached accesses. In fact, with read-ahead,
this is probably not an uncommon scenario.
OTOH, if we aggressively poll the ring when we have an opportunity
to, there's very little down side to that and it addresses the
serialization problem.
But we can't guarantee that we'll get those opportunities, so it
doesn't address the problem in a general way. A guest that doesn't
use hpet and only has a single virtio-blk device will not have any
reason to exit to qemu.
We can mitigate this with a timer but honestly, we need to do perf
measurements to see. My feeling is that we will need some more
aggressive form of polling than just waiting for IO completion. I don't
think queue depth is enough because it assumes that all requests are
equal. When dealing with cache=off or even just storage with it's own
cache, that's simply not the case.
Regards,
Anthony Liguori