* Peter Xu (pet...@redhat.com) wrote: > On Fri, Jul 06, 2018 at 11:56:59AM +0100, Dr. David Alan Gilbert wrote: > > * Dr. David Alan Gilbert (dgilb...@redhat.com) wrote: > > > * Peter Xu (pet...@redhat.com) wrote: > > > > Based-on: <20180627132246.5576-1-pet...@redhat.com> > > > > > > > > Based on the series to unbreak postcopy: > > > > Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery > > > > Message-Id: <20180627132246.5576-1-pet...@redhat.com> > > > > > > > > This series introduce a new postcopy recovery test. The new test > > > > actually helped me to identify two bugs there so fix them as well > > > > before 3.0 release. > > > > > > > > Patch 1: a trivial cleanup for existing postcopy ram load, which I > > > > found a bit confusing during debugging the problem. > > > > > > > > Patch 2-3: two bug fixes that address different issues. Please see > > > > the commit log for more information. > > > > > > > > Patch 4-9: add the postcopy recovery unit test. > > > > > > > > Please review. Thanks, > > > > > > Queued > > > > Hi Peter, > > There's a problem in there somewhere; I'm getting > > an intermittent failure of the test if I run a make check -j 8 on my > > laptop. Just running two copies of tests/migration-test in parallel > > sometimes triggers it (but not if I turn on QTEST_LOG!). > > But it's always failing with: > > > > > > ERROR:/home/dgilbert/git/migpull/tests/migration-test.c:373:migrate_recover: > > assertion failed: (qdict_haskey(rsp, "return")) > > Hmm, so this should be a race. I suspect it's because destination VM > hasn't reached the correct state when sending the recovery command. > > Could you help to try these two tiny patches to see whether it can fix > the problem?
Yes, this seems to work; even running 6 in parallel. Dave > ================ > > commit d875ea1a98932174e3fa202859b65df26def174d > Author: Peter Xu <pet...@redhat.com> > Date: Tue Jul 10 11:17:24 2018 +0800 > > migration: show pause/recover state on dst host > > These two states will be missing when doing "query-migrate" on > destination VM. Add these states so that we can get the query results > as expected. > > Signed-off-by: Peter Xu <pet...@redhat.com> > > diff --git a/migration/migration.c b/migration/migration.c > index 0404c53215..8d56d56930 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -911,6 +911,8 @@ static void fill_destination_migration_info(MigrationInfo > *info) > case MIGRATION_STATUS_CANCELLED: > case MIGRATION_STATUS_ACTIVE: > case MIGRATION_STATUS_POSTCOPY_ACTIVE: > + case MIGRATION_STATUS_POSTCOPY_PAUSED: > + case MIGRATION_STATUS_POSTCOPY_RECOVER: > case MIGRATION_STATUS_FAILED: > case MIGRATION_STATUS_COLO: > info->has_status = true; > > ================ > > commit 9fa7fc773961cd0ea0b5f70a166def0d8aebf464 > Author: Peter Xu <pet...@redhat.com> > Date: Tue Jul 10 11:18:48 2018 +0800 > > tests: don't send recovery cmd until dst pauses > > Signed-off-by: Peter Xu <pet...@redhat.com> > > diff --git a/tests/migration-test.c b/tests/migration-test.c > index 96e69dab99..45558446f1 100644 > --- a/tests/migration-test.c > +++ b/tests/migration-test.c > @@ -646,6 +646,13 @@ static void test_postcopy_recovery(void) > */ > migrate_pause(from); > > + /* > + * Wait for destination side to reach postcopy-paused state. The > + * migrate-recover command can only succeed if destination machine > + * is in the paused state > + */ > + wait_for_migration_status(to, "postcopy-paused"); > + > /* > * Create a new socket to emulate a new channel that is different > * from the broken migration channel; tell the destination to > > ================ > > Thanks! > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK