* Alexey Kardashevskiy (a...@ozlabs.ru) wrote: > On 02/06/2014 03:45 AM, Paolo Bonzini wrote: > > Il 05/02/2014 17:42, Dr. David Alan Gilbert ha scritto: > >> Because: > >> * the code is still running and keeps redirtying a small handful of > >> pages > >> * but because we've underestimated our available bandwidth we never > >> stop > >> it and just throw those pages across immediately > > > > Ok, I thought Alexey was saying we are not redirtying that handful of pages. > > > Every iteration we read the dirty map from KVM and send all dirty pages > across the stream. > > > > And in turn, this is because the max downtime we have is too low > > (especially for the default 32 MB/sec default bandwidth; that's also pretty > > low). > > > My understanding nooow is that in order to finish migration QEMU waits for > the earliest 100ms (BUFFER_DELAY) of continuously low trafic but due to > those pages getting dirty every time we read the dirty map, we transfer > more in these 100ms than we are actually allowed (>32MB/s or 320KB/100ms). > So we transfer-transfer-transfer, detect than we transfer too much, do > delay() and if max_size (calculated from actual transfer and downtime) for > the next iteration is less (by luck) than those 96 pages (uncompressed) - > we finish.
How about turning on some of the debug in migration.c; I suggest not all of it, but how about the : DPRINTF("transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64 "\n", transferred_bytes, time_spent, bandwidth, max_size); and also the s->dirty_bytes_rate value. It would help check our assumptions. > Increasing speed or/and downtime will help but still - we would not need > that if migration did not expect all 96 pages to have to be sent but did > have some smart way to detect that many are empty (so - compressed). I think the other way would be to keep track of the compression ratio; if we knew how many pages we'd sent, and how much bandwidth that had used, we could divide the pending_bytes by that to get a *different* approximation. However, the problem is that my understanding is we're trying to _gurantee_ a maximum downtime, and to do that we have to use the calculation that assumes that all the pages we have are going to take the maximum time to transfer, and only go into downtime then. > Literally, move is_zero_range() from ram_save_block() to > migration_bitmap_sync() and store this bit in some new pages_zero_map, for > example. But does it make a lot of sense? The problem is that means checking whether it's zero more often; at the moment we check it's zero once during sending; to do what you're suggesting would mean we'd have to check every page is zero, every time we sync, and I think that's more often than we send. Have you tried disabling the call to is_zero_range in arch_init.c's ram_block so that (as long as you have XBZRLE off) we don't do any compression; if the theory is right then your problem should go away. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK