Il 17/09/2013 21:51, Michael S. Tsirkin ha scritto: > A much more interesting case is e.g. disabling memory. > E.g. > > config write (disable memory) > read (flush out outstanding writes) > write <- must now have no effect
This works already. memory_region_del_subregion is synchronous, and will remain synchronous wrt the current CPU even after memory dispatch is moved out of the BQL. This is racy of course: VCPU 1 VCPU 2 ------------------------------------------------------------ config write (disable memory) write read This is also racy: VCPU 1 DMA to MMIO region ------------------------------------------------------------ config write (disable memory) write read This is the case where a write from another device can do weird things such as causing a packet to be transmitted on a NIC. As you wrote (and I agree) device removal is a superset of disabling memory and bus master; ergo, if we handle it for disabling memory we need to handle it for device removal too. This is why I believe qemu_del_nic belongs in instance_finalize. > Or disabling bus master: > > config write (disable bus master) > read (flush in outstanding writes) > <- device must now not change memory This doesn't work. The problem is that when you disable bus master any previous call to address_space_map remains mapped. Whoever called address_space_map can and will write blindly to that area. This cannot be fixed by synchronize_rcu() no matter where you place it. The logic is like this address_space_map rcu_read_lock() mr = find memory region for address memory_region_ref(mr) rcu_read_unlock() do actual read or write address_space_unmap memory_region_unref(mr) synchronize_rcu() ensures that future writes will not write to memory. But it does not ensure anything about writes started before. And unfortunately a write "starts" at the moment you start an AIO operation, and lasts until roughly when the AIO callback executes. If you want to fix that, the right place to do it is the PCI core. You need a new callback of some sort in the PCI device, that will synchronously cancel all pending I/O when the bus master bit is set to zero. I don't think address_space_map/unmap has any equivalent PCI transaction (or exists in any other sane bus, for that matter). However, we do it pervasively for obvious performance reasons. I'm afraid that this is an emulation tradeoff that we cannot avoid. > And these rules absolutely must be obeyed, > if you don't you'll break guests. Yes (and I think it was already discussed, I have some deja vu feeling). I'm not sure it is _that_ important (QEMU lived with a no-op bus master bit for years and nothing exploded), but it is certainly good to know what works and what doesn't---and why, and whether it is fixable. > So I think we really should find a way to make the above work correctly. > Removal will follow almost automatically, since it disables memory and > mastering by itself. These patches are concerned with the "almost" part. The difference between config writes and removal is that, as the code is currently written, config writes cannot cause dangling pointers. Paolo