On Tue, Jul 06, 2021 at 12:05:49PM +0200, David Hildenbrand wrote: > On 06.07.21 11:41, Wang, Wei W wrote: > > On Monday, July 5, 2021 9:42 PM, David Hildenbrand wrote: > > > On 03.07.21 04:53, Wang, Wei W wrote: > > > > On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote: > > > > > On 02.07.21 04:48, Wang, Wei W wrote: > > > > > > On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote: > > > > > > > On 01.07.21 14:51, Peter Xu wrote: > > > > > > > > > > I think that clearly shows the issue. > > > > > > > > > > My theory I did not verify yet: Assume we have 1GB chunks in the > > > > > clear bmap. > > > > > Assume the VM reports all pages within a 1GB chunk as free (easy with > > > > > a fresh VM). While processing hints, we will clear the bits from the > > > > > dirty bmap in the RAMBlock. As we will never migrate any page of that > > > > > 1GB chunk, we will not actually clear the dirty bitmap of the memory > > > > > region. When re-syncing, we will set all bits bits in the dirty bmap > > > > > again from the dirty bitmap in the memory region. Thus, many of our > > > > > hints end up being mostly ignored. The smaller the clear bmap chunk, > > > > > the > > > more extreme the issue. > > > > > > > > OK, that looks possible. We need to clear the related bits from the > > > > memory region bitmap before skipping the free pages. Could you try with > > > below patch: > > > > > > I did a quick test (with the memhog example) and it seems like it mostly > > > works. > > > However, we're now doing the bitmap clearing from another, racing with the > > > migration thread. Especially: > > > > > > 1. Racing with clear_bmap_set() via > > > cpu_physical_memory_sync_dirty_bitmap() > > > 2. Racing with migration_bitmap_clear_dirty() > > > > > > So that might need some thought, if I'm not wrong. > > > > I think this is similar to the non-clear_bmap case, where the rb->bmap is > > set or cleared by > > the migration thread and qemu_guest_free_page_hint. For example, the > > migration thread > > could find a bit is set from rb->bmap before qemu_guest_free_page_hint gets > > that bit cleared in time. > > The result is that the free page is transferred, which isn't necessary, but > > won't cause any issue. > > This is very rare in practice. > > Here are my concerns regarding races: > > a) We now end up calling migration_clear_memory_region_dirty_bitmap() > without holding the bitmap_mutex. We have to clarify if that is ok. At least > migration_bitmap_clear_dirty() holds it *while* clearing the log and > migration_bitmap_sync() while setting bits in the clear_bmap. I think we > also have to hold the mutex on the new path.
Makes sense; I think we can let bitmap_mutex to protect both dirty/clear bitmaps, and also the dirty pages counter. I'll comment in Wei's patch too later. > > b) We now can end up calling memory_region_clear_dirty_bitmap() > concurrently, not sure if all notifiers can handle that. I'd assume it is > okay, but I wouldn't bet on it. Yes, kvm should be the only one that uses it for real, while there's the slots lock in kvm_physical_log_clear(), looks okay so far. We'd better not turn off clear dirty log for this; it's a feature that I know brought lots of benefits to get dirty logging, and we're even considering to enable it for dirty ring so dirty ring can also benefit from it. Thanks, -- Peter Xu