Hi! I hit a problem with migration which I would like to know how to fix.
I run the source QEMU as this: ./qemu-system-ppc64 -enable-kvm -m 1024 -machine pseries \ -nographic -vga none For the destination, I add "-incoming tcp:localhost:4000". Both run on the same POWER8 machine, the latest QEMU with the my "hpratio=1" and "fix SLB migration" patches applied. The host kernel is 3.12 with 64K system page size (does not matter much though). Since the source QEMU does not get any kernel or disk or net, it stays in SLOF prompt. Very simple config. Now I do migration. First "bulk" iteration goes pretty quick, all good. When it is done, we enter "while (pending()) iterate()" loop in the migration_thread() function. The idea is - when the number of changes becomes small enough to get transferred within a "maximum downtime" timeout, the migration will finish. However bit different thing happens in this configuration. For some reason (which I do not really know, I would like to but it is irrelevant here) SLOF keeps dirtying few pages (for example, 6), each is 64K or up to 96 4K pages in QEMU terms (393216 bytes). Every time the ram_save_pending() is called, 96 pages are dirty. This is not a huge number but ram_save_iterate() moves the migration file pointer only 287544 bytes further because this is what was actually transferred and this number is less due to the "is_zero_range(p, TARGET_PAGE_SIZE)" optimization. So. migration_thread() gets dirty pages number, tries to send them in a loop but every iteration resets the number of pages to 96 and we start again. After several tries we cross BUFFER_DELAY timeout and calculate new @max_size and if the host machine is fast enough it is bigger than 393216 and next loop will finally finish the migration. How to fix this misbehavior? I can only think of something simple like below and not sure it does not break other things. I would expect ram_save_pending() to return correct number of bytes QEMU is going to send rather than number of pages multiplied by 4096 but checking if all these pages are really empty is not too cheap. Thanks! diff --git a/arch_init.c b/arch_init.c index 2ba297e..90949b0 100644 --- a/arch_init.c +++ b/arch_init.c @@ -537,16 +537,17 @@ static int ram_save_block(QEMUFile *f, bool last_stage) acct_info.dup_pages++; } } } else if (is_zero_range(p, TARGET_PAGE_SIZE)) { acct_info.dup_pages++; bytes_sent = save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS); qemu_put_byte(f, 0); + qemu_update_position(f, TARGET_PAGE_SIZE); bytes_sent++; } else if (!ram_bulk_stage && migrate_use_xbzrle()) { current_addr = block->offset + offset; bytes_sent = save_xbzrle_page(f, p, current_addr, block, offset, cont, last_stage); if (!last_stage) { p = get_cached_data(XBZRLE.cache, current_addr); } -- Alexey