On Fri, Mar 03, 2023 at 10:10:28AM +0100, Juan Quintela wrote: > Daniel P. Berrangé <berra...@redhat.com> wrote: > > On Thu, Mar 02, 2023 at 05:22:11PM +0000, Peter Maydell wrote: > >> migration-test has been flaky for a long time, both in CI and > >> otherwise: > >> > >> https://gitlab.com/qemu-project/qemu/-/jobs/3806090216 > >> (a FreeBSD job) > >> 32/648 > >> ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: > >> assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) > >> ERROR > >> > >> on a local macos x86 box: > >> ▶ 34/621 > >> ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: > >> assertion failed: (!g_str_equal(status, "failed")) ERROR > >> 34/621 qemu:qtest+qtest-i386 / qtest-i386/migration-test > >> ERROR 168.12s killed by signal 6 SIGABRT > >> ――――――――――――――――――――――――――――――――――――― ✀ > >> ――――――――――――――――――――――――――――――――――――― > >> stderr: > >> qemu-system-i386: Failed to peek at channel > >> query-migrate shows failed migration: Unable to write to socket: Broken > >> pipe > >> ** > >> ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: > >> assertion failed: (!g_str_equal(status, "failed")) > >> > >> (test program exited with status code -6) > >> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > >> > >> ▶ 37/621 > >> ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: > >> assertion failed: (!g_str_equal(status, "failed")) ERROR > >> 37/621 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test > >> ERROR 174.37s killed by signal 6 SIGABRT > >> ――――――――――――――――――――――――――――――――――――― ✀ > >> ――――――――――――――――――――――――――――――――――――― > >> stderr: > >> query-migrate shows failed migration: Unable to write to socket: Broken > >> pipe > >> ** > >> ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: > >> assertion failed: (!g_str_equal(status, "failed")) > >> > >> (test program exited with status code -6) > >> > >> In the cases where I've looked at the underlying log, this seems to > >> be in the migration/multifd/tcp/plain/cancel subtest. Disable that > >> specific subtest by default until somebody can track down the > >> underlying cause. Enthusiasts can opt back in by setting > >> QEMU_TEST_FLAKY_TESTS=1 in their environment. > > > > No objection to disabling the test. Given the many multifd fixes we > > have seen, I fear that unlikely many of the flakey tests, this is > > not merely a test problem, but rather has a decent chance of being > > a real bug in migration code. > > What is really weird with this failure is that: > - it only happens on non-x86
That doesn't seem right as the two reports Peter has above are both x86 > - on code that is not arch dependent > - on cancel, what we really do there is close fd's for the multifd > channel threads to get out of the recv, i.e. again, nothing that > should be arch dependent. > > As said in the other email, I expect to get back access to ARM servers > next week. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|