Il 18/09/2013 10:41, Michael S. Tsirkin ha scritto: > On Wed, Sep 18, 2013 at 09:40:19AM +0200, Paolo Bonzini wrote: >> Il 18/09/2013 07:48, Michael S. Tsirkin ha scritto: >>> So I think the fix is actually obeying ordering rules, >>> that is know that write is in progress >>> and flush on read. >> >> I think this can be modeled as a generic, synchronous >> (*busmaster_cancel)(PCIDevice*) callback, that is called after bus >> master is turned off. You don't even really have to wait for a read. > > Not really. > Bus master is just an single instance of the bigger issue. > It could be any device-specific register just as well. > > PCI reads and writes must obey ordering rules. > ATM MMIO and DMA achieve this by using a single lock.
There are two issues. One is synchronization with address_space_map...unmap. Reads and writes from address_space_map/unmap are already handled outside the BQL (possibly in the kernel), so they already ignore any ordering. As far as memory writes are concerned, it is indeed specific to bus master, see PCIe spec 6.4: The ability of the driver and/or system software to block new Requests from the device is supported by the Bus Master Enable, SERR Enable, and Interrupt Disable bits in the Command register (Section 7.5.1.1), and other such control bits. The second is ordering of writes and between writes and reads. This does not concern device->memory transaction because when a device does DMA it can use smp_*mb() to achieve the ordering. Similarly I think we can ignore message requests (but we probably have hidden bugs now, for example you probably need a smp_wmb() in msi{,x}_notify). But even though this only concerns CPU->device transactions, the restrictions are very strong. They apply across distinct devices (because the ordering is already enforced at the root complex level, right?), so basically you'd have to wrap each and every MMIO operation destined to BARs or configuration space with some kind of pci_global_{read,write}_{start,end} API. This is likely slower than just having a "big MMIO lock" around all memory accesses(*). Devices can then move part of the processing to a separate thread or bottom half. (*) You cannot really do that for all of them; if a BAR is backed by RAM, accesses would still be unordered since they do not go through QEMU at all. But does guest code actually care? In many cases, I suspect that sticking a smp_rmb() in the read side of "unlocked" register accesses, and a smp_wmb() in the write side, will do just fine. And add a compatibility property to place a device back under the BQL for guests that have problems. Paolo > If you want to move MMIO and DMA out of a common lock you must find some > other way to force ordering. > >>> I think moving memory region destroy out to finalize makes sense >>> irrespectively, as long as destroy is made idempotent so we can simply >>> destroy everything without worrying whether we initialized it. >>> >>> The rest of the changes will be harder, we'll have to do >>> them carefully on a case by case basis. >> >> Good, we are in agreement then. >> >> Paolo > >