Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize

Paolo Bonzini Wed, 18 Sep 2013 04:27:51 -0700

Il 18/09/2013 10:41, Michael S. Tsirkin ha scritto:
> On Wed, Sep 18, 2013 at 09:40:19AM +0200, Paolo Bonzini wrote:
>> Il 18/09/2013 07:48, Michael S. Tsirkin ha scritto:
>>> So I think the fix is actually obeying ordering rules,
>>> that is know that write is in progress
>>> and flush on read.
>>
>> I think this can be modeled as a generic, synchronous
>> (*busmaster_cancel)(PCIDevice*) callback, that is called after bus
>> master is turned off.  You don't even really have to wait for a read.
> 
> Not really.
> Bus master is just an single instance of the bigger issue.
> It could be any device-specific register just as well.
> 
> PCI reads and writes must obey ordering rules.
> ATM MMIO and DMA achieve this by using a single lock.


There are two issues.

One is synchronization with address_space_map...unmap.  Reads and writes
from address_space_map/unmap are already handled outside the BQL
(possibly in the kernel), so they already ignore any ordering.  As far
as memory writes are concerned, it is indeed specific to bus master, see
PCIe spec 6.4:

   The ability of the driver and/or system software to block new
   Requests from the device is supported by the Bus Master Enable, SERR
   Enable, and Interrupt Disable bits in the Command register
   (Section 7.5.1.1), and other such control bits.

The second is ordering of writes and between writes and reads.  This
does not concern device->memory transaction because when a device does
DMA it can use smp_*mb() to achieve the ordering.  Similarly I think we
can ignore message requests (but we probably have hidden bugs now, for
example you probably need a smp_wmb() in msi{,x}_notify).

But even though this only concerns CPU->device transactions, the
restrictions are very strong.  They apply across distinct devices
(because the ordering is already enforced at the root complex level,
right?), so basically you'd have to wrap each and every MMIO operation
destined to BARs or configuration space with some kind of
pci_global_{read,write}_{start,end} API.  This is likely slower than
just having a "big MMIO lock" around all memory accesses(*).  Devices
can then move part of the processing to a separate thread or bottom half.

     (*) You cannot really do that for all of them; if a BAR is backed
         by RAM, accesses would still be unordered since they do not go
         through QEMU at all.

But does guest code actually care?  In many cases, I suspect that
sticking a smp_rmb() in the read side of "unlocked" register accesses,
and a smp_wmb() in the write side, will do just fine.  And add a
compatibility property to place a device back under the BQL for guests
that have problems.

Paolo

> If you want to move MMIO and DMA out of a common lock you must find some
> other way to force ordering.
> 
>>> I think moving memory region destroy out to finalize makes sense
>>> irrespectively, as long as destroy is made idempotent so we can simply
>>> destroy everything without worrying whether we initialized it.
>>>
>>> The rest of the changes will be harder, we'll have to do
>>> them carefully on a case by case basis.
>>
>> Good, we are in agreement then.
>>
>> Paolo
> 
>

Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize

Reply via email to