this is v5 of my patch series with various optimizations in zero buffer checking and migration tweaks.
thanks especially to Eric Blake, Orit Wassermann and Paolo Bonzini for reviewing. v5: - move zero splat vector to a different patch - fix indentation of can_user_buffer_find_nonzero_offset() - do not unroll the first loop in buffer_find_nonzero_offset() to optimize it for zero page checking - use an older unrolled version of find_next_bit() without SIMD instruction as there is no evidence that the vectorized version is better if not even worse and the code is easier to understand. - added a word in the commit message of patch 8 about the skipped pages field in QMP MigrationStats. - fixed the order of key-value pairs of MigrationStats in qapi-schema.json - updated info about the performance benefit of is_zero_page() to the latest benchmark results in the commit message. v4: - do not inline buffer_find_nonzero_offset() - inline can_usebuffer_find_nonzero_offset() correctly - readd asserts in buffer_find_nonzero_offset() as profiling shows they do not hurt. - change last occurences of scalar 8 by BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR - avoid deferencing p already in patch 5 where we know that the page (p) is zero - explicitly set bytes_sent = 0 if we skip a zero page. bytes_sent was 0 before, but it was not obvious. - add accounting information for skipped zero pages - fix errors reported by checkpatch.pl v3: - remove asserts, inline functions and add a check function if buffer_find_nonzero_offset() can be used. - use above check function in buffer_is_zero() and find_next_bit(). - use buffer_is_nonzero_offset() directly to find zero pages. we know that all requirements are met for memory pages. - fix C89 violation in buffer_is_zero(). - avoid derefencing p in ram_save_block() if we already know the page is zero. - fix initialization of last_offset in reset_ram_globals(). - avoid skipping pages with offset == 0 in bulk stage in migration_bitmap_find_and_reset_dirty(). - compared to v1 check for zero pages also after bulk ram migration as there are guests (e.g. Windows) which zero out large amount of memory while running. v2: - fix description, add trivial zero check and add asserts to buffer_find_nonzero_offset. - add a constant for the unroll factor of buffer_find_nonzero_offset - replace is_dup_page() by buffer_is_zero() - added test results to xbzrle patch - optimize descriptions Peter Lieven (10): move vector definitions to qemu-common.h add a zero splat vector to qemu-common.h cutils: add a function to find non-zero content in a buffer buffer_is_zero: use vector optimizations if possible bitops: unroll while loop in find_next_bit() migration: search for zero instead of dup pages migration: add an indicator for bulk state of ram migration migration: do not sent zero pages in bulk stage migration: do not search dirty pages in bulk stage migration: use XBZRLE only after bulk stage arch_init.c | 74 +++++++++++++++++++---------------------- hmp.c | 2 ++ include/migration/migration.h | 2 ++ include/qemu-common.h | 37 +++++++++++++++++++++ migration.c | 3 +- qapi-schema.json | 8 +++-- qmp-commands.hx | 3 +- util/bitops.c | 18 +++++++++- util/cutils.c | 60 +++++++++++++++++++++++++++++++++ 9 files changed, 162 insertions(+), 45 deletions(-) -- 1.7.9.5