* Peter Xu (pet...@redhat.com) wrote: > v3: > - Use WITH_QEMU_LOCK_GUARD() for patch 2 [Eric] > (potentially I can also replace other existing uses of qemu_file_lock into > WITH_QEMU_LOCK_GUARD, but I decided to took Dave's r-b first and leave that > for later) > - Added r-bs for Dave on patch 2/4 > - Add a comment in patch 5 to explain why it's safe to unregister yank without > qemu_file_lock, in postcopy_pause() [Dave] > > v2: > - Pick r-b for Dave on patch 1/3 > - Move migration_file_get_ioc() from patch 5 to patch 4, meanwhile rename it > to > qemu_file_get_ioc(). [Dave] > - Patch 2 "migration: Shutdown src in await_return_path_close_on_source()" is > replaced by patch "migration: Make from_dst_file accesses thread-safe" > [Dave] > > Patch 1 fixes a possible race that migration thread can accidentally skip > join() of rp_thread even if the return thread is enabled. Patch 1 is > suspected > to also be the root cause of the recent hard-to-reproduce migration-test > failure here reported by PMM: > > https://lore.kernel.org/qemu-devel/YPamXAHwan%2FPPXLf@work-vm/ > > I didn't reproduce it myself; but after co-debugged with Dave it's suspected > that the race of rp_thread could be the cause. It's not exposed before > because > yank is soo strict on releasing instances, while we're not that strict before, > and didn't join() on rp_thread wasn't so dangerous after all when migration > succeeded before. > > Patch 2 fixes another theoretical race on accessing from_dst_file spotted by > Dave. I don't think there's known issues with it, but may still worth fixing. > > Patch 3 should be a cleanup on yank that I think would be nice to have. > > Patch 4-5 are further cleanups to remove the ref==1 check in channel_close(), > finally, as I always thought that's a bit hackish. So I used explicit > unregister of the yank function at necessary places to replace that ref==1 > one. > > I still think having patch 3-5 altogether would be great, however I think > patch > 1 should still be the most important to be reviewed. Also it would be great > to > know whether patch 1 could fix the known yank crash. > > Please review, thanks.
Queued with the fix; it survived over a long weekend running about 50k times OK. Dave > Peter Xu (5): > migration: Fix missing join() of rp_thread > migration: Make from_dst_file accesses thread-safe > migration: Introduce migration_ioc_[un]register_yank() > migration: Teach QEMUFile to be QIOChannel-aware > migration: Move the yank unregister of channel_close out > > migration/channel.c | 15 ++------- > migration/migration.c | 57 +++++++++++++++++++++++++++-------- > migration/migration.h | 15 +++++++-- > migration/multifd.c | 8 ++--- > migration/qemu-file-channel.c | 11 ++----- > migration/qemu-file.c | 17 ++++++++++- > migration/qemu-file.h | 4 ++- > migration/ram.c | 3 +- > migration/savevm.c | 11 +++++-- > migration/yank_functions.c | 42 ++++++++++++++++++++++++++ > migration/yank_functions.h | 3 ++ > 11 files changed, 138 insertions(+), 48 deletions(-) > > -- > 2.31.1 > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK