RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()

Wang, Wei W Wed, 07 Jul 2021 05:45:58 -0700

On Wednesday, July 7, 2021 1:40 AM, Peter Xu wrote:
> On Tue, Jul 06, 2021 at 12:05:49PM +0200, David Hildenbrand wrote:
> > On 06.07.21 11:41, Wang, Wei W wrote:
> > > On Monday, July 5, 2021 9:42 PM, David Hildenbrand wrote:
> > > > On 03.07.21 04:53, Wang, Wei W wrote:
> > > > > On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
> > > > > > On 02.07.21 04:48, Wang, Wei W wrote:
> > > > > > > On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
> > > > > > > > On 01.07.21 14:51, Peter Xu wrote:
> > > > > >
> > > > > > I think that clearly shows the issue.
> > > > > >
> > > > > > My theory I did not verify yet: Assume we have 1GB chunks in the 
> > > > > > clear
> bmap.
> > > > > > Assume the VM reports all pages within a 1GB chunk as free
> > > > > > (easy with a fresh VM). While processing hints, we will clear
> > > > > > the bits from the dirty bmap in the RAMBlock. As we will never
> > > > > > migrate any page of that 1GB chunk, we will not actually clear
> > > > > > the dirty bitmap of the memory region. When re-syncing, we
> > > > > > will set all bits bits in the dirty bmap again from the dirty
> > > > > > bitmap in the memory region. Thus, many of our hints end up
> > > > > > being mostly ignored. The smaller the clear bmap chunk, the
> > > > more extreme the issue.
> > > > >
> > > > > OK, that looks possible. We need to clear the related bits from
> > > > > the memory region bitmap before skipping the free pages. Could
> > > > > you try with
> > > > below patch:
> > > >
> > > > I did a quick test (with the memhog example) and it seems like it mostly
> works.
> > > > However, we're now doing the bitmap clearing from another, racing
> > > > with the migration thread. Especially:
> > > >
> > > > 1. Racing with clear_bmap_set() via
> > > > cpu_physical_memory_sync_dirty_bitmap()
> > > > 2. Racing with migration_bitmap_clear_dirty()
> > > >
> > > > So that might need some thought, if I'm not wrong.
> > >
> > > I think this is similar to the non-clear_bmap case, where the
> > > rb->bmap is set or cleared by the migration thread and
> > > qemu_guest_free_page_hint. For example, the migration thread could find a
> bit is set from rb->bmap before qemu_guest_free_page_hint gets that bit
> cleared in time.
> > > The result is that the free page is transferred, which isn't necessary, 
> > > but won't
> cause any issue.
> > > This is very rare in practice.
> >
> > Here are my concerns regarding races:
> >
> > a) We now end up calling migration_clear_memory_region_dirty_bitmap()
> > without holding the bitmap_mutex. We have to clarify if that is ok. At
> > least
> > migration_bitmap_clear_dirty() holds it *while* clearing the log and
> > migration_bitmap_sync() while setting bits in the clear_bmap. I think
> > we also have to hold the mutex on the new path.
> 
> Makes sense; I think we can let bitmap_mutex to protect both dirty/clear
> bitmaps, and also the dirty pages counter.  I'll comment in Wei's patch too 
> later.


Btw, what would you think if we change mutex to QemuSpin? It will also reduce 
the overhead, I think.

Best,
Wei

RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()

Reply via email to