On 27/03/20 16:04, Peter Xu wrote: > That makes perfect sense to me, however... What if the vcpu generates > dirty bits _after_ we do KVM_GET_DIRTY_LOG but _before_ we send > KVM_SET_USER_MEMORY_REGION to remove the memslot? If the vcpu is in > the userspace I think it's fine because BQL is needed so it won't be > able to, however the vcpus running inside KVM should not be restricted > to that. I think those dirty bits will still get lost, but it should > be extremely hard to trigger.
Yes, you've found a bug. > I'm not sure whether I missed something above, but if I'm correct, I > think the solution should be a flag for KVM_SET_USER_MEMORY_REGION to > set the memslot as invalid (KVM_MEM_INVALID), then when removing the > memslot which has KVM_MEM_LOG_DIRTY_PAGES enabled, we should: > > - send KVM_SET_USER_MEMORY_REGION with KVM_MEM_INVALID to invalidate > the memslot, but keep the slot and bitmap in KVM > > - send KVM_GET_DIRTY_LOG to fetch the bitmap for the slot > > - send KVM_SET_USER_MEMORY_REGION with size==0 to remove the slot Or KVM_MEM_READONLY. > However I don't know whether that'll worth it. Yes, especially in the light of the dirty ring issue below. > (Side question which is irrelevant to this: for kvm dirty ring we now > need to do similar thing to flush dirty bits before removing a > memslot, however that's even trickier because flushing dirty ring > needs to kick all vcpu out, currently the RFC series is using > run_on_cpu() which will release the BQL and wait for all vcpus to > quit into userspace, however that cannot be done inside > kvm_set_phys_mem because it needs the BQL. I'm still thinking about > a good way to fix this, but any idea is greatly welcomed :) The problem here is also that the GFN is not an unique identifier of the QEMU ram_addr_t. However you don't really need to kick all vCPUs out, do you? You can protect the dirty ring with its own per-vCPU mutex and harvest the pages from the main thread. Paolo