On Sat, Jul 03, 2021 at 02:53:27AM +0000, Wang, Wei W wrote: > On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote: > > On 02.07.21 04:48, Wang, Wei W wrote: > > > On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote: > > >> On 01.07.21 14:51, Peter Xu wrote: > > > > I think that clearly shows the issue. > > > > My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap. > > Assume the VM reports all pages within a 1GB chunk as free (easy with a > > fresh > > VM). While processing hints, we will clear the bits from the dirty bmap in > > the > > RAMBlock. As we will never migrate any page of that 1GB chunk, we will not > > actually clear the dirty bitmap of the memory region. When re-syncing, we > > will > > set all bits bits in the dirty bmap again from the dirty bitmap in the > > memory > > region. Thus, many of our hints end up being mostly ignored. The smaller the > > clear bmap chunk, the more extreme the issue. > > OK, that looks possible. We need to clear the related bits from the memory > region > bitmap before skipping the free pages. Could you try with below patch: > > diff --git a/migration/ram.c b/migration/ram.c > index ace8ad431c..a1f6df3e6c 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -811,6 +811,26 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, > RAMBlock *rb, > return next; > } > > + > +static void migration_clear_memory_region_dirty_bitmap(RAMState *rs, > + RAMBlock *rb, > + unsigned long page) > +{ > + uint8_t shift; > + hwaddr size, start; > + > + if (!rb->clear_bmap || !clear_bmap_test_and_clear(rb, page)) > + return; > + > + shift = rb->clear_bmap_shift; > + assert(shift >= 6); > + > + size = 1ULL << (TARGET_PAGE_BITS + shift); > + start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size); > + trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page); > + memory_region_clear_dirty_bitmap(rb->mr, start, size); > +} > + > static inline bool migration_bitmap_clear_dirty(RAMState *rs, > RAMBlock *rb, > unsigned long page) > @@ -827,26 +847,9 @@ static inline bool migration_bitmap_clear_dirty(RAMState > *rs, > * the page in the chunk we clear the remote dirty bitmap for all. > * Clearing it earlier won't be a problem, but too late will. > */ > - if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) { > - uint8_t shift = rb->clear_bmap_shift; > - hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift); > - hwaddr start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size); > - > - /* > - * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this > - * can make things easier sometimes since then start address > - * of the small chunk will always be 64 pages aligned so the > - * bitmap will always be aligned to unsigned long. We should > - * even be able to remove this restriction but I'm simply > - * keeping it. > - */ > - assert(shift >= 6); > - trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page); > - memory_region_clear_dirty_bitmap(rb->mr, start, size); > - } > + migration_clear_memory_region_dirty_bitmap(rs, rb, page); > > ret = test_and_clear_bit(page, rb->bmap); > - > if (ret) { > rs->migration_dirty_pages--; > } > @@ -2746,7 +2749,7 @@ void qemu_guest_free_page_hint(void *addr, size_t len) > { > RAMBlock *block; > ram_addr_t offset; > - size_t used_len, start, npages; > + size_t used_len, start, npages, page_to_clear, i = 0; > MigrationState *s = migrate_get_current(); > > /* This function is currently expected to be used during live migration > */ > @@ -2775,6 +2778,19 @@ void qemu_guest_free_page_hint(void *addr, size_t len) > start = offset >> TARGET_PAGE_BITS; > npages = used_len >> TARGET_PAGE_BITS; > > + /* > + * The skipped free pages are equavelent to be sent from clear_bmap's > + * perspective, so clear the bits from the memory region bitmap which > + * are initially set. Otherwise those skipped pages will be sent in > + * the next round after syncing from the memory region bitmap. > + */ > + /* > + * The skipped free pages are equavelent to be sent from clear_bmap's > + * perspective, so clear the bits from the memory region bitmap which > + * are initially set. Otherwise those skipped pages will be sent in > + * the next round after syncing from the memory region bitmap. > + */ > + do { > + page_to_clear = start + (i++ << block->clear_bmap_shift);
Why "i" needs to be shifted? > + migration_clear_memory_region_dirty_bitmap(ram_state, > + block, > + page_to_clear); > + } while (i <= npages >> block->clear_bmap_shift); I agree with David that this should be better put into the mutex section, if so we'd also touch up comment for bitmap_mutex. Or is it a reason to explicitly not do so? > + > qemu_mutex_lock(&ram_state->bitmap_mutex); > ram_state->migration_dirty_pages -= > bitmap_count_one_with_offset(block->bmap, start, > npages); After my patch (move mutex out of migration_bitmap_clear_dirty()), maybe we can call migration_bitmap_clear_dirty() directly here rather than introducing a new helper? It'll done all the dirty/clear bmap ops including dirty page accounting. Thanks, -- Peter Xu