Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()

Peter Xu Tue, 13 Jul 2021 08:59:52 -0700

On Tue, Jul 13, 2021 at 08:40:21AM +0000, Wang, Wei W wrote:
> On Thursday, July 1, 2021 4:08 AM, Peter Xu wrote:
> > Taking the mutex every time for each dirty bit to clear is too slow, 
> > especially we'll
> > take/release even if the dirty bit is cleared.  So far it's only used to 
> > sync with
> > special cases with qemu_guest_free_page_hint() against migration thread,
> > nothing really that serious yet.  Let's move the lock to be upper.
> > 
> > There're two callers of migration_bitmap_clear_dirty().
> > 
> > For migration, move it into ram_save_iterate().  With the help of MAX_WAIT
> > logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so 
> > taking
> > the lock once there at the entry.  It also means any call sites to
> > qemu_guest_free_page_hint() can be delayed; but it should be very rare, only
> > during migration, and I don't see a problem with it.
> > 
> > For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot to take
> > that lock even when calling ramblock_sync_dirty_bitmap(), where another
> > example is migration_bitmap_sync() who took it right.  So let the mutex 
> > cover
> > both the
> > ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.
> > 
> > It's even possible to drop the lock so we use atomic operations upon 
> > rb->bmap
> > and the variable migration_dirty_pages.  I didn't do it just to still be 
> > safe, also
> > not predictable whether the frequent atomic ops could bring overhead too 
> > e.g.
> > on huge vms when it happens very often.  When that really comes, we can
> > keep a local counter and periodically call atomic ops.  Keep it simple for 
> > now.
> > 
> > Cc: Wei Wang <wei.w.w...@intel.com>
> > Cc: David Hildenbrand <da...@redhat.com>
> > Cc: Hailiang Zhang <zhang.zhanghaili...@huawei.com>
> > Cc: Dr. David Alan Gilbert <dgilb...@redhat.com>
> > Cc: Juan Quintela <quint...@redhat.com>
> > Cc: Leonardo Bras Soares Passos <lsoar...@redhat.com>
> > Signed-off-by: Peter Xu <pet...@redhat.com>
> 
> FWIW
> Reviewed-by: Wei Wang <wei.w.w...@intel.com>


Thanks, Wei.

> 
> If no one could help do a regression test of free page hint, please document 
> something like this in the patch:
> Locking at the coarser granularity is possible to minimize the improvement 
> brought by free page hints, but seems not causing critical issues.
> We will let users of free page hints to report back any requirement and come 
> up with a better solution later.

Didn't get a chance to document it as it's in a pull now; but as long as you're
okay with no-per-page lock (which I still don't agree with), I can follow this 
up.

Are below parameters enough for me to enable free-page-hint?

     -object iothread,id=io1 \
     -device virtio-balloon,free-page-hint=on,iothread=io1 \

I tried to verify it's running by tracing inside guest with kprobe
report_free_page_func() but it didn't really trigger.  Guest has kernel
5.11.12-300.fc34.x86_64, should be fairly new to have that supported.  Do you
know what I'm missing?

P.S. This also reminded me that, maybe we want an entry in qemu-options.hx for
balloon device, as it has lots of options, some of them may not be easy to be
setup right.

-- 
Peter Xu

Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()

Reply via email to