On Thursday, July 1, 2021 4:08 AM, Peter Xu wrote: > Taking the mutex every time for each dirty bit to clear is too slow, > especially we'll > take/release even if the dirty bit is cleared. So far it's only used to sync > with > special cases with qemu_guest_free_page_hint() against migration thread, > nothing really that serious yet. Let's move the lock to be upper. > > There're two callers of migration_bitmap_clear_dirty(). > > For migration, move it into ram_save_iterate(). With the help of MAX_WAIT > logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so > taking > the lock once there at the entry. It also means any call sites to > qemu_guest_free_page_hint() can be delayed; but it should be very rare, only > during migration, and I don't see a problem with it. > > For COLO, move it up to colo_flush_ram_cache(). I think COLO forgot to take > that lock even when calling ramblock_sync_dirty_bitmap(), where another > example is migration_bitmap_sync() who took it right. So let the mutex cover > both the > ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls. > > It's even possible to drop the lock so we use atomic operations upon rb->bmap > and the variable migration_dirty_pages. I didn't do it just to still be > safe, also > not predictable whether the frequent atomic ops could bring overhead too e.g. > on huge vms when it happens very often. When that really comes, we can > keep a local counter and periodically call atomic ops. Keep it simple for > now. >
If free page opt is enabled, 50ms waiting time might be too long for handling just one hint (via qemu_guest_free_page_hint)? How about making the lock conditionally? e.g. #define QEMU_LOCK_GUARD_COND (lock, cond) { if (cond) QEMU_LOCK_GUARD(lock); } Then in migration_bitmap_clear_dirty: QEMU_LOCK_GUARD_COND(&rs->bitmap_mutex, rs->fpo_enabled); Best, Wei