When doing cleanup of the multifd send threads we're calling QLIST_REMOVE concurrently on the migration_threads list. This seems to be the source of the crashes we've seen on the multifd/tcp/plain/cancel tests.
I'm running the test in a loop and after a few dozen iterations I see the crash in dmesg. QTEST_QEMU_BINARY=./qemu-system-x86_64 \ QEMU_TEST_FLAKY_TESTS=1 \ ./tests/qtest/migration-test -p /x86_64/migration/multifd/tcp/plain/cancel multifdsend_10[11382]: segfault at 18 ip 0000564b77de1e25 sp 00007fdf767fb610 error 6 in qemu-system-x86_64[564b777b4000+e1c000] Code: ec 10 48 89 7d f8 48 83 7d f8 00 74 58 48 8b 45 f8 48 8b 40 10 48 85 c0 74 14 48 8b 45 f8 48 8b 40 10 48 8b 55 f8 48 8b 52 18 <48> 89 50 18 48 8b 45 f8 48 8b 40 18 48 8b 55 f8 48 8b 52 10 48 89 the offending instruction is a mov dereferencing the thread->node.le_next pointer at QLIST_REMOVE in MigrationThreadDel: void MigrationThreadDel(MigrationThread *thread) { if (thread) { QLIST_REMOVE(thread, node); g_free(thread); } } where: #define QLIST_REMOVE(elm, field) do { \ if ((elm)->field.le_next != NULL) \ (elm)->field.le_next->field.le_prev = \ <-- HERE (elm)->field.le_prev; \ *(elm)->field.le_prev = (elm)->field.le_next; \ (elm)->field.le_next = NULL; \ (elm)->field.le_prev = NULL; \ } while (/*CONSTCOND*/0) The MigrationThreadDel function is called from the multifd threads and is not under any lock, so several calls can race when accessing the list. (I actually hit this first on my fixed-ram branch which changes some synchronization in multifd and makes the issue more frequent) CI run: https://gitlab.com/farosas/qemu/-/pipelines/891000519 Fabiano Rosas (3): migration/multifd: Rename threadinfo.c functions migration/multifd: Protect accesses to migration_threads tests/qtest: Re-enable multifd cancel test migration/migration.c | 7 +++++-- migration/multifd.c | 5 +++-- migration/threadinfo.c | 23 ++++++++++++++++++++--- migration/threadinfo.h | 8 ++++---- tests/qtest/migration-test.c | 10 ++-------- 5 files changed, 34 insertions(+), 19 deletions(-) -- 2.35.3