On Thu, Jul 22, 2021 at 01:43:41PM +0200, David Hildenbrand wrote: > > > a) In precopy code, always clearing all dirty bits from the bitmap that > > > correspond to discarded range, whenever we update the dirty bitmap. > > > This > > > results in logically unplugged memory to never get migrated. > > > > Have you seen cases where discarded areas are being marked as dirty? > > That suggests something somewhere is writing to them and shouldn't be. > > I have due to sub-optimal clear_bmap handling to be sorted out by > > https://lkml.kernel.org/r/20210722083055.23352-1-wei.w.w...@intel.com > > Whereby the issue is rather that initially dirty bits don't get cleared in > lower layers and keep popping up as dirty. > > The issue with postcopy recovery code setting discarded ranges dirty in > the dirty bitmap, I did not try reproducing. But from looking at the > code, it's pretty clear that it would happen. > > Apart from that, nothing should dirty that memory. Of course, > malicious guests could trigger it for now, in which case we wouldn't catch it > and migrate such pages with postcopy, because the final bitmap sync in > ram_postcopy_send_discard_bitmap() is performed without calling notifiers > right now.
I have the same concern with Dave: does it mean that we don't need to touch at least ramblock_sync_dirty_bitmap in patch 3? Doing that for bitmap init and postcopy recovery looks right. One other trivial comment is instead of touching up ram_dirty_bitmap_reload(), IMHO it's simpler to set all 1's to disgarded memories on dst receivedmap; imagine multiple postcopy recovery happened, then with that we walk the disgard memory list only once for each migration. Not a big deal, though. -- Peter Xu