On Thu, Apr 20, 2023 at 02:59:00PM +0200, Juan Quintela wrote: > Daniel P. Berrangé <berra...@redhat.com> wrote: > > There are 27 pre-copy live migration scenarios being tested. In all of > > these we force non-convergance and run for one iteration, then let it > > converge and wait for completion during the second (or following) > > iterations. At 3 mbps bandwidth limit the first iteration takes a very > > long time (~30 seconds). > > > > While it is important to test the migration passes and convergance > > logic, it is overkill to do this for all 27 pre-copy scenarios. The > > TLS migration scenarios in particular are merely exercising different > > code paths during connection establishment. > > > > To optimize time taken, switch most of the test scenarios to run > > non-live (ie guest CPUs paused) with no bandwidth limits. This gives > > a massive speed up for most of the test scenarios. > > > > For test coverage the following scenarios are unchanged > > > > * Precopy with UNIX sockets > > * Precopy with UNIX sockets and dirty ring tracking > > * Precopy with XBZRLE > > * Precopy with multifd > > Just for completeness: the other test that is still slow is > /migration/vcpu_dirty_limit. > > > - migrate_ensure_non_converge(from); > > + if (args->live) { > > + migrate_ensure_non_converge(from); > > + } else { > > + migrate_ensure_converge(from); > > + } > > Looks ... weird? > But the only way that I can think of improving it is to pass args to > migrate_ensure_*() and that is a different kind of weird.
I'm going to change this a little anyway. Currently for the non-live case, I start the migration and then stop the CPUs. I want to reverse that order, as we should have CPUs paused before even starting the migration to ensure we don't have any re-dirtied pages at all. > > > } else { > > - if (args->iterations) { > > - while (args->iterations--) { > > + if (args->live) { > > + if (args->iterations) { > > + while (args->iterations--) { > > + wait_for_migration_pass(from); > > + } > > + } else { > > wait_for_migration_pass(from); > > } > > + > > + migrate_ensure_converge(from); > > I think we should change iterations to be 1 when we create args, but > otherwise, treat 0 as 1 and change it to something in the lines of: > > if (args->live) { > while (args->iterations-- >= 0) { > wait_for_migration_pass(from); > } > migrate_ensure_converge(from); > > What do you think? I think in retrospect 'iterations' was overkill as we only use the values 0 (implicitly 1) or 2. IOW we could just just a 'bool multipass' to distinguish the two cases. > > - qtest_qmp_eventwait(to, "RESUME"); > > + if (!args->live) { > > + qtest_qmp_discard_response(to, "{ 'execute' : 'cont'}"); > > + } > > + if (!got_resume) { > > + qtest_qmp_eventwait(to, "RESUME"); > > + } > > > > wait_for_serial("dest_serial"); > > } > > I was looking at the "culprit" of Lukas problem, and it is not directly > obvious. I see that when we expect one event, we just drop any event > that we are not interested in. I don't know if that is the proper > behaviour or if that is what affecting this test. I've not successfully reproduced it yet, nor figured out a real scenario where it could plausibly happen. i'm looking to add more debug to help us out. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|