Fabiano Rosas <faro...@suse.de> writes: > Peter Xu <pet...@redhat.com> writes: > >> From: Fabiano Rosas <faro...@suse.de> >> >> To do so, create two paired sockets, but make them not providing real data. >> Feed those fake sockets to src/dst QEMUs for recovery to let them go into >> RECOVER stage without going out. Test that we can always kick it out and >> recover again with the right ports. >> >> This patch is based on Fabiano's version here: >> >> https://lore.kernel.org/r/877cowmdu0....@suse.de >> >> Signed-off-by: Fabiano Rosas <faro...@suse.de> >> [peterx: write commit message, remove case 1, fix bugs, and more] >> Signed-off-by: Peter Xu <pet...@redhat.com> >> --- >> tests/qtest/migration-test.c | 94 ++++++++++++++++++++++++++++++++++++ >> 1 file changed, 94 insertions(+) >> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c >> index 46f1c275a2..fb7a3765e4 100644 >> --- a/tests/qtest/migration-test.c >> +++ b/tests/qtest/migration-test.c >> @@ -729,6 +729,7 @@ typedef struct { >> /* Postcopy specific fields */ >> void *postcopy_data; >> bool postcopy_preempt; >> + bool postcopy_recovery_test_fail; >> } MigrateCommon; >> >> static int test_migrate_start(QTestState **from, QTestState **to, >> @@ -1381,6 +1382,78 @@ static void test_postcopy_preempt_tls_psk(void) >> } >> #endif >> >> +static void wait_for_postcopy_status(QTestState *one, const char *status) >> +{ >> + wait_for_migration_status(one, status, >> + (const char * []) { "failed", "active", >> + "completed", NULL }); >> +} >> + >> +static void postcopy_recover_fail(QTestState *from, QTestState *to) >> +{ >> + int ret, pair1[2], pair2[2]; >> + char c; >> + >> + /* Create two unrelated socketpairs */ >> + ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair1); >> + g_assert_cmpint(ret, ==, 0); >> + >> + ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair2); >> + g_assert_cmpint(ret, ==, 0); >> + >> + /* >> + * Give the guests unpaired ends of the sockets, so they'll all blocked >> + * at reading. This mimics a wrong channel established. >> + */ >> + qtest_qmp_fds_assert_success(from, &pair1[0], 1, >> + "{ 'execute': 'getfd'," >> + " 'arguments': { 'fdname': 'fd-mig' }}"); >> + qtest_qmp_fds_assert_success(to, &pair2[0], 1, >> + "{ 'execute': 'getfd'," >> + " 'arguments': { 'fdname': 'fd-mig' }}"); >> + >> + /* >> + * Write the 1st byte as QEMU_VM_COMMAND (0x8) for the dest socket, to >> + * emulate the 1st byte of a real recovery, but stops from there to >> + * keep dest QEMU in RECOVER. This is needed so that we can kick off >> + * the recover process on dest QEMU (by triggering the G_IO_IN event). >> + * >> + * NOTE: this trick is not needed on src QEMUs, because src doesn't >> + * rely on an pre-existing G_IO_IN event, so it will always trigger the >> + * upcoming recovery anyway even if it can read nothing. >> + */ >> +#define QEMU_VM_COMMAND 0x08 >> + c = QEMU_VM_COMMAND; >> + ret = send(pair2[1], &c, 1, 0); >> + g_assert_cmpint(ret, ==, 1); >> + >> + migrate_recover(to, "fd:fd-mig"); >> + migrate_qmp(from, "fd:fd-mig", "{'resume': true}"); >> + >> + /* >> + * Make sure both QEMU instances will go into RECOVER stage, then test >> + * kicking them out using migrate-pause. >> + */ >> + wait_for_postcopy_status(from, "postcopy-recover"); >> + wait_for_postcopy_status(to, "postcopy-recover"); > > Is this wait out of place? I think we're trying to resume too fast after > migrate_recover(): > > # { > # "error": { > # "class": "GenericError", > # "desc": "Cannot resume if there is no paused migration" > # } > # } >
Ugh, sorry about the long lines: { "error": { "class": "GenericError", "desc": "Cannot resume if there is no paused migration" } }