On Monday, July 5, 2021 9:42 PM, David Hildenbrand wrote: > On 03.07.21 04:53, Wang, Wei W wrote: > > On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote: > >> On 02.07.21 04:48, Wang, Wei W wrote: > >>> On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote: > >>>> On 01.07.21 14:51, Peter Xu wrote: > >> > >> I think that clearly shows the issue. > >> > >> My theory I did not verify yet: Assume we have 1GB chunks in the clear > >> bmap. > >> Assume the VM reports all pages within a 1GB chunk as free (easy with > >> a fresh VM). While processing hints, we will clear the bits from the > >> dirty bmap in the RAMBlock. As we will never migrate any page of that > >> 1GB chunk, we will not actually clear the dirty bitmap of the memory > >> region. When re-syncing, we will set all bits bits in the dirty bmap > >> again from the dirty bitmap in the memory region. Thus, many of our > >> hints end up being mostly ignored. The smaller the clear bmap chunk, the > more extreme the issue. > > > > OK, that looks possible. We need to clear the related bits from the > > memory region bitmap before skipping the free pages. Could you try with > below patch: > > I did a quick test (with the memhog example) and it seems like it mostly > works. > However, we're now doing the bitmap clearing from another, racing with the > migration thread. Especially: > > 1. Racing with clear_bmap_set() via cpu_physical_memory_sync_dirty_bitmap() > 2. Racing with migration_bitmap_clear_dirty() > > So that might need some thought, if I'm not wrong.
I think this is similar to the non-clear_bmap case, where the rb->bmap is set or cleared by the migration thread and qemu_guest_free_page_hint. For example, the migration thread could find a bit is set from rb->bmap before qemu_guest_free_page_hint gets that bit cleared in time. The result is that the free page is transferred, which isn't necessary, but won't cause any issue. This is very rare in practice. > > The simplest approach would be removing/freeing the clear_bmap via > PRECOPY_NOTIFY_SETUP(), similar to > precopy_enable_free_page_optimization() we had before. Of course, this will > skip the clear_bmap optimization. Not necessarily, I think. The two optimizations are not mutual exclusive essentially. At least, they work well together now. If there are other implementation issues reported, we could check them then. Best, Wei