On 23.07.21 18:12, Peter Xu wrote:
On Thu, Jul 22, 2021 at 01:43:41PM +0200, David Hildenbrand wrote:
a) In precopy code, always clearing all dirty bits from the bitmap that
     correspond to discarded range, whenever we update the dirty bitmap. This
     results in logically unplugged memory to never get migrated.

Have you seen cases where discarded areas are being marked as dirty?
That suggests something somewhere is writing to them and shouldn't be.

I have due to sub-optimal clear_bmap handling to be sorted out by

https://lkml.kernel.org/r/20210722083055.23352-1-wei.w.w...@intel.com

Whereby the issue is rather that initially dirty bits don't get cleared in
lower layers and keep popping up as dirty.

The issue with postcopy recovery code setting discarded ranges dirty in
the dirty bitmap, I did not try reproducing. But from looking at the
code, it's pretty clear that it would happen.

Apart from that, nothing should dirty that memory. Of course,
malicious guests could trigger it for now, in which case we wouldn't catch it
and migrate such pages with postcopy, because the final bitmap sync in
ram_postcopy_send_discard_bitmap() is performed without calling notifiers
right now.

I have the same concern with Dave: does it mean that we don't need to touch at
least ramblock_sync_dirty_bitmap in patch 3?

Yes, see the comment in patch #3:

"
Note: If discarded ranges span complete clear_bmap chunks, we'll never
clear the corresponding bits from clear_bmap and consequently never call
memory_region_clear_dirty_bitmap on the affected regions. While this is
perfectly fine, we're still synchronizing the bitmap of discarded ranges,
for example, in
ramblock_sync_dirty_bitmap()->cpu_physical_memory_sync_dirty_bitmap()
but also during memory_global_dirty_log_sync().

In the future, it might make sense to never even synchronize the dirty log of these ranges, for example in KVM code, skipping discarded ranges
completely.
"

The KVM path might be even more interesting (with !dirty ring IIRC).

So that might certainly be worth looking into if we find it to be a real performance problem.


Doing that for bitmap init and postcopy recovery looks right.

One other trivial comment is instead of touching up ram_dirty_bitmap_reload(),
IMHO it's simpler to set all 1's to disgarded memories on dst receivedmap;
imagine multiple postcopy recovery happened, then with that we walk the disgard
memory list only once for each migration.  Not a big deal, though.

Right, but I decided to reuse ramblock_dirty_bitmap_exclude_discarded_pages() such that I can avoid yet another helper.

--
Thanks,

David / dhildenb


Reply via email to