On Tue, 2022-11-29 at 15:50 -0500, Peter Xu wrote: > On Tue, Nov 29, 2022 at 05:28:26PM -0300, Leonardo Bras Soares Passos wrote: > > Hello Peter, > > Leo, > > > > > On Thu, Nov 24, 2022 at 1:04 PM Peter Xu <pet...@redhat.com> wrote: > > > > > > On Wed, Nov 09, 2022 at 02:56:29AM -0300, Leonardo Bras wrote: > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > > index a0cdb714f7..250caff7f4 100644 > > > > --- a/migration/savevm.c > > > > +++ b/migration/savevm.c > > > > @@ -1889,6 +1889,8 @@ static void *postcopy_ram_listen_thread(void > > > > *opaque) > > > > exit(EXIT_FAILURE); > > > > } > > > > > > > > + migration_load_cleanup(); > > > > > > It's a bit weird to call multifd-load-clean in a listen phase.. > > > > I agree. > > > > > > > > How about moving it right above > > > trace_process_incoming_migration_co_postcopy_end_main()? Then the new > > > helper can also be static. > > > > Seems a nice Idea to have this function to be static. > > > > We have to guarantee this is run after the migration finished, but > > before migration_incoming_state_destroy(). > > IIUC it doesn't need to be when migration finished. It should be fine as > long as we finished precopy phase, and that's what the migration coroutine > does, iiuc. The thing is postcopy doesn't use multifd at all, so logically > it can be released before postcopy starts. > > Actually, IMHO it'll be safer to do it like that, just to make sure we > won't accidentally receive multifd pages _after_ postcopy starts, because > that'll be another more severe and hard to debug issue since the guest can > see partial copied pages from multifd recv channels. > > > > > You suggested calling it right above of > > trace_process_incoming_migration_co_postcopy_end_main(), which git > > grep pointed me to an if clause in process_incoming_migration_co(). > > If I got the location correctly, it would not help: this coroutine is > > ran just after the VM went to the target host, and not when the > > migration finished. > > > > If we are using multifd channels, this will break the migration with > > segmentation fault (SIGSEGV), since the channels have not finished > > sending yet. > > If this happens, then I had a feeling that there's something else that > needs syncs. As I discussed above, we should make sure multifd pages all > landed before we start vcpu threads. > > Said that, now I think I'm not against your original proposal to fix this > immediate crash. However I am still wondering whether we really should > disable multifd with postcopy, as there seem to be still a few missing > pieces even to enable multifd during precopy-only. > > Thanks, >
I got side-tracked on this issue. Is there any patch disabling multifd + postcopy, or would it be fine to go back working on a V2 for this one? Best regards, Leo