Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize

Paolo Bonzini Wed, 18 Sep 2013 06:12:47 -0700

Il 18/09/2013 13:56, Peter Maydell ha scritto:
>> > But does guest code actually care?  In many cases, I suspect that
>> > sticking a smp_rmb() in the read side of "unlocked" register accesses,
>> > and a smp_wmb() in the write side, will do just fine.  And add a
>> > compatibility property to place a device back under the BQL for guests
>> > that have problems.
> Yuck. This sounds like a recipe for spending the next five years
> debugging subtle race conditions. We need to continue to support
> the semantics that the architecture and hardware specs define
> for memory access orderings to emulated devices.


We cannot in the general case, QEMU is not a cycle-exact simulator.

You need to look at the particular case.  And if you look at particular
cases, you'll find many that are already broken now.

For example, we already have no such guarantee for RAM BARs when running
under KVM, because accesses do not go through QEMU and are not
serialized by the BQL.

Or you could have a device with an MSI vector, program it to write to
RAM, and poll the RAM location from the guest.  Such a write would
currently not be ordered with previous DMA from the device, which
contradicts the PCI spec.  (This is a bug and can be fixed).

address_space_map/unmap pretty much breaks any DMA that is concurrent
with control register access (e.g. the PCI command register).

And all these cases are already there!

Moving devices outside the BQL of course generates more of them.  But
it's not like everything is broken.  For example, ordering memory access
to one emulated device from one CPU is handled naturally (in either TCG
or KVM mode).  Ordering of accesses from a CPU with those from the QEMU
data-plane code is also handled simply with locks or memory barriers
private to the device.

With multiple VCPUs operating at the same time (e.g. the send path of a
network driver on a VCPU, with the interrupts processed on another VCPU)
the activities are likely not independent and the guest is doing its own
synchronization anyway.  It's more likely that they use a lock, but they
can even do Dekker-style synchronization using MMIO registers and it
will just work as long as MMIO read/write ops use
atomic_mb_read/atomic_mb_set (i.e. as long as the bus ordering
guarantees are implemented locally to the device).

There's nothing magic, really.  Both PV and real devices have been doing
it forever by placing some registers in RAM instead of MMIO, and
communicating synchronization points via interrupts and doorbell registers.

But above all, devices have to request BQL-free MMIO explicitly.  You do
not have to use it at all, you can just use all the infrastructure to do
unlocked bus-master DMA (which is anyway already broken from the
ordering POV).  You can limit BQL-free MMIO to PV devices, or to
extremely simple devices, or to one or two highly-optimized registers.
There is a huge gamut of choices, and no magic really.

Paolo

Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize

Reply via email to