Daniel P. Berrangé <berra...@redhat.com> wrote: > The migration test cases that actually exercise live migration want to > ensure there is a minimum of two iterations of pre-copy, in order to > exercise the dirty tracking code. > > Historically we've queried the migration status, looking for the > 'dirty-sync-count' value to increment to track iterations. This was > not entirely reliable because often all the data would get transferred > quickly enough that the migration would finish before we wanted it > to. So we massively dropped the bandwidth and max downtime to > guarantee non-convergance. This had the unfortunate side effect > that every migration took at least 30 seconds to run (100 MB of > dirty pages / 3 MB/sec). > > This optimization takes a different approach to ensuring that a > mimimum of two iterations. Rather than waiting for dirty-sync-count > to increment, directly look for an indication that the source VM > has dirtied RAM that has already been transferred. > > On the source VM a magic marker is written just after the 3 MB > offset. The destination VM is now montiored to detect when the > magic marker is transferred. This gives a guarantee that the > first 3 MB of memory have been transferred. Now the source VM > memory is monitored at exactly the 3MB offset until we observe > a flip in its value. This gives us a guaranteed that the guest > workload has dirtied a byte that has already been transferred. > > Since we're looking at a place that is only 3 MB from the start > of memory, with the 3 MB/sec bandwidth, this test should complete > in 1 second, instead of 30 seconds. > > Once we've proved there is some dirty memory, migration can be > set back to full speed for the remainder of the 1st iteration, > and the entire of the second iteration at which point migration > should be complete. > > Signed-off-by: Daniel P. Berrangé <berra...@redhat.com>
Hi I think this is not enough. As said before: - xbzrle needs 3 iterations - auto converge needs around 12 iterations (forgot) the exact number, but it is a lot. - for (almost) all the rest of the tests, we don't really care, we just need the migration to finish. One easy way to "test" it is: Change the "meaning" of ZERO downtime to mean that we don't want to enter the completion stage, just continue sending data. Changig this in qemu: modified migration/migration.c @@ -2726,6 +2726,9 @@ static MigIterateState migration_iteration_run(MigrationState *s) trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy); + if (s->threshold_size == 0) { + return MIG_ITERATE_RESUME; + } if (must_precopy <= s->threshold_size) { qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy); pending_size = must_precopy + can_postcopy; And just setting the downtime to zero should be enough. It is too late, so before I start with this, what do you think? Later, Juan.