QEMU is currently accessing the dirty bitmaps very liberally, which is understandable since the accesses are cheap. This is however not good for squeezing maximum performance out of dataplane, and is also not good if the accesses become more expensive---as is the case when they use atomic primitives.
This patch series does the following optimizations and cleanups: 1) it lets KVM code treat migration as "just another dirty bitmap client" instead of needing the special global_log_start/stop callbacks. These remain in use in Xen and vhost. This removes code and avoids bugs such as the one fixed in commit 4cc856f (kvm-all: Sync dirty-bitmap from kvm before kvm destroy the corresponding dirty_bitmap, 2015-04-02). 2) it avoids modifications to unused dirty bitmaps: code if TCG is disabled, migration if no migration is in progress, VGA for regions other than VRAM. and on top of this makes dirty bitmap access atomic. I'm not including the patch to make the migration thread synchronize the bitmap outside the big QEMU lock (thus removing the last source of jitter during the RAM copy phase of migration) but it is also enabled by these patches. Patches 1-4 are cleanups to DIRTY_MEMORY_VGA users. Patches 5-12 are the first cleanup (KVM treats migration as just another client). Patches 13-14 are a simple optimization that is enabled by these patches. Patches 15-18 are bonus cleanups to translate-all.c's dirty memory tracking for TCG. Patches 19-22 are the second cleanup (avoid modifications to unused dirty bitmaps). Patches 23-28 are Stefan's patches for atomic access to the dirty bitmap, which has no performance impact in the common case thanks to the previous work. Patch 29 is an unrelated strengthening of assertions, that mst spotted while reviewing v1. v2->v3: 22 patches reviewed by Fam fixed tcx24_update_display (patch 4, Fam) fixed comments (patch 5, Fam) improved commit messages (patches 7/10/11, Fam) fixed logic for full word loops (patches 23/24, Fam) avoid duplicated ~(ram_addr_t)0 constant (patch 29, Fam) v1->v2: completed work on removing global_start/global_stop from KVM listener extra spelunking of TCG history so that the exec.c code makes more sense extra splitting of patches (Stefan) keep memory_region_is_logging and memory_region_get_dirty_log_mask APIs separate (mst) Paolo Bonzini (23): memory: the only dirty memory flag for users is DIRTY_MEMORY_VGA g364fb: remove pointless call to memory_region_set_coalescing display: enable DIRTY_MEMORY_VGA tracking explicitly display: add memory_region_sync_dirty_bitmap calls memory: differentiate memory_region_is_logging and memory_region_get_dirty_log_mask memory: prepare for multiple bits in the dirty log mask framebuffer: check memory_region_is_logging ui/console: remove dpy_gfx_update_dirty memory: track DIRTY_MEMORY_CODE in mr->dirty_log_mask kvm: accept non-mapped memory in kvm_dirty_pages_log_change memory: include DIRTY_MEMORY_MIGRATION in the dirty log mask kvm: remove special handling of DIRTY_MEMORY_MIGRATION in the dirty log mask ram_addr: tweaks to xen_modified_memory exec: use memory_region_get_dirty_log_mask to optimize dirty tracking exec: move functions to translate-all.h translate-all: remove unnecessary argument to tb_invalidate_phys_range cputlb: remove useless arguments to tlb_unprotect_code_phys, rename translate-all: make less of tb_invalidate_phys_page_range depend on is_cpu_write_access exec: pass client mask to cpu_physical_memory_set_dirty_range exec: invert return value of cpu_physical_memory_get_clean, rename exec: only check relevant bitmaps for cleanliness memory: do not touch code dirty bitmap unless TCG is enabled memory: use mr->ram_addr in "is this RAM?" assertions Stefan Hajnoczi (6): bitmap: add atomic set functions bitmap: add atomic test and clear memory: use atomic ops for setting dirty memory bits migration: move dirty bitmap sync to ram_addr.h memory: replace cpu_physical_memory_reset_dirty() with test-and-clear memory: make cpu_physical_memory_sync_dirty_bitmap() fully atomic arch_init.c | 46 +-------------- cputlb.c | 7 +-- exec.c | 99 +++++++++++++++---------------- hw/display/cg3.c | 2 + hw/display/exynos4210_fimd.c | 20 ++++--- hw/display/framebuffer.c | 4 ++ hw/display/g364fb.c | 3 +- hw/display/sm501.c | 2 + hw/display/tcx.c | 3 + hw/display/vmware_vga.c | 2 +- hw/virtio/dataplane/vring.c | 2 +- hw/virtio/vhost.c | 9 ++- include/exec/cputlb.h | 3 +- include/exec/exec-all.h | 6 +- include/exec/memory.h | 25 ++++++-- include/exec/ram_addr.h | 138 ++++++++++++++++++++++++++++--------------- include/qemu/bitmap.h | 4 ++ include/qemu/bitops.h | 14 +++++ include/ui/console.h | 4 -- kvm-all.c | 77 ++++++------------------ linux-user/mmap.c | 7 ++- memory.c | 81 +++++++++++++++++-------- translate-all.c | 20 +++---- translate-all.h | 7 +++ ui/console.c | 61 ------------------- user-exec.c | 1 + util/bitmap.c | 83 ++++++++++++++++++++++++++ xen-hvm.c | 22 ++++--- 28 files changed, 408 insertions(+), 344 deletions(-) -- 1.8.3.1