Re: [PATCH v5 3/5] migration: enable multifd and postcopy together

Prasad Pandit Wed, 12 Feb 2025 05:29:08 -0800

Hi,

On Tue, 11 Feb 2025 at 20:50, Peter Xu <pet...@redhat.com> wrote:
> > * Yes. AFAIU, tls/file channels don't send magic values.
> Please double check whether TLS will send magics.  AFAICT, they should.
===
  * ... Also tls live migration already does
  * tls handshake while initializing main channel so with tls this
  * issue is not possible.
  */
  if (qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
  } else if (mis->from_src_file
           && (!strcmp(ioc->name, "migration-tls-incoming")
            || !strcmp(ioc->name, "migration-file-incoming"))) {
        channel = CH_MULTIFD;
  }
===
* From the comment and condition above, both 'tls' and 'file' channels
are not peekable, ie. no magic value to peek. The
'migration-file-incoming' check also helps to cover the
migrate_mapped_ram() case IIUC.


> No.  We need to figure out a way to do that properly, and that's exactly
> what I mentioned as one of the core changes we need in this series, which
> is still missing.  We may or may not need an ACK message.  Please think
> about it.

* First we tried to call 'multifd_send_shutdown()' to close multifd
channels before calling postcopy_start(). That's the best case
scenario wherein multifd channels are closed before postcopy starts.
So that there's no confusion and/or jumbling of different data
packets. It did not work, as qemu would crash during
multifd_shutdown().

* Second is we push/flush all multifd pages before calling
postcopy_start() and let the multifd channels exist/stay till the
migration ends, after that they are duly shutdown. It is working well
so far, passing all migration tests too.

* Third, if we want to confirm that multifd pages are received on the
destination before calling postcopy_start(), then the best way is for
the destination to send an acknowledgement to the source side that it
has received and processed all multifd pages and nothing is pending on
the multifd channels.

* Another could be to define a multifd_recv_flush() function, which
could process and clear the receive side multifd queue, so that no
multifd pages are pending there. Not sure how best to do this yet.
Also considering it lacks proper communication and synchronisation
between source and destination sides, it does not seem like the best
solution.

* Do you have any other option/approach in mind?

> Please consider the case where multifd recv threads may get scheduled out
> on dest host during precopy phase, not getting chance to be scheduled until
> postcopy already started running on dst, then the recv thread can stumble
> upon a page that was sent during precopy.  As long as that can be always
> avoided, I think we should be good.

* TBH, this sounds like a remote corner case.

* I'm testing the revised patch series and will send it shortly.

Thank you.
---
  - Prasad

Re: [PATCH v5 3/5] migration: enable multifd and postcopy together

Reply via email to