Hi, On Tue, 11 Feb 2025 at 20:50, Peter Xu <pet...@redhat.com> wrote: > > * Yes. AFAIU, tls/file channels don't send magic values. > Please double check whether TLS will send magics. AFAICT, they should. === * ... Also tls live migration already does * tls handshake while initializing main channel so with tls this * issue is not possible. */ if (qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { } else if (mis->from_src_file && (!strcmp(ioc->name, "migration-tls-incoming") || !strcmp(ioc->name, "migration-file-incoming"))) { channel = CH_MULTIFD; } === * From the comment and condition above, both 'tls' and 'file' channels are not peekable, ie. no magic value to peek. The 'migration-file-incoming' check also helps to cover the migrate_mapped_ram() case IIUC.
> No. We need to figure out a way to do that properly, and that's exactly > what I mentioned as one of the core changes we need in this series, which > is still missing. We may or may not need an ACK message. Please think > about it. * First we tried to call 'multifd_send_shutdown()' to close multifd channels before calling postcopy_start(). That's the best case scenario wherein multifd channels are closed before postcopy starts. So that there's no confusion and/or jumbling of different data packets. It did not work, as qemu would crash during multifd_shutdown(). * Second is we push/flush all multifd pages before calling postcopy_start() and let the multifd channels exist/stay till the migration ends, after that they are duly shutdown. It is working well so far, passing all migration tests too. * Third, if we want to confirm that multifd pages are received on the destination before calling postcopy_start(), then the best way is for the destination to send an acknowledgement to the source side that it has received and processed all multifd pages and nothing is pending on the multifd channels. * Another could be to define a multifd_recv_flush() function, which could process and clear the receive side multifd queue, so that no multifd pages are pending there. Not sure how best to do this yet. Also considering it lacks proper communication and synchronisation between source and destination sides, it does not seem like the best solution. * Do you have any other option/approach in mind? > Please consider the case where multifd recv threads may get scheduled out > on dest host during precopy phase, not getting chance to be scheduled until > postcopy already started running on dst, then the recv thread can stumble > upon a page that was sent during precopy. As long as that can be always > avoided, I think we should be good. * TBH, this sounds like a remote corner case. * I'm testing the revised patch series and will send it shortly. Thank you. --- - Prasad