Hello Fabiano,

On Tue, 18 Feb 2025 at 03:20, Fabiano Rosas <faro...@suse.de> wrote:
> > -static bool migration_should_start_incoming(bool main_channel)
> > +static bool migration_has_main_and_multifd_channels(void)
> >  {
...
> > +    /* main channel and all multifd channels are established */
> >      return true;
> >  }
>
> How will this avoid peeking the preempt channel? You're assuming preempt
> is mutually exclusive with multifd it seems. Otherwise you could get the
> preempt channel creation racing with multifd channels creation.

* IIUC postcopy preempt channel is created towards the end of the
migration when the Postcopy phase starts. At that time, the 'main' and
'multifd' channels are already established and working. Having the
'main' and when multifd is enabled 'multifd' channels in place is a
requirement for starting new migration. So when both the 'main' and
'multifd' channels are established, the new incoming connection is
seen as the 'postcopy' one; And it falls to the 'else' block of the
'if' conditional ->  if (!migration_has_main_and_multifd_channels()) {

* When does postcopy preempt channel creation race with the multifd
channel creation?

> > @@ -989,13 +983,12 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
> > Error **errp)
> > +    if (!migration_has_main_and_multifd_channels()) {
> I'm not entirely sure we need these checks here. They will happen anyway
> later. Could this be replaced by migration_needs_multiple_sockets()
> instead?

* migration_needs_multiple_sockets() => return migrate_multifd() ||
migrate_postcopy_preempt();

* That logical OR should have been AND, no? It returns true even when
one of them is true. That's not multiple types (multifd/postcopy) of
sockets. I don't think it helps much.

* Let's try this:
    - First differentiation: peekable Vs non-peekable channels
    - Peekable channels
         - Main
         - Multifd
    - Non-peekable channels
         - Postcopy preempt
         - TLS
         - File/mapped_ram

    if (qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK))
    {
            if (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC)) {
                channel = CH_MAIN;
            } else if (channel_magic == cpu_to_be32(MULTIFD_MAGIC)) {
                channel = CH_MULTIFD;
    } else {
            if (!strcmp(ioc->name, "migration-tls-incoming")
                || !strcmp(ioc->name, "migration-file-incoming"))) {
               channel = CH_MULTIFD;
            } else {
               channel = CH_POSTCOPY;
            }
    }

* With above, the 'main' channel shall have to send 'magic value' even
for reconnection during the postcopy recovery phase. If all channels
were consistent and sent a magic value, this code would be much
simpler and we may not have to care/worry about the _order_ in which
these connections are made.

   if (channel == 'main')
       process_main_channel()
   else if (channel == 'multifd')
       process_multifd_channel()
   else if (channel == 'tls')
       process_tls_channel()
    else if (channel == 'file')
       process_file_channel()
    else if (channel == 'postcopy')
       process_postcopy_channel()

> And I'd put this whole channel discovery business in channel.c since
> it's encoding several assumptions about channels. Some helpers used here
> might need to be exported, but that's ok.
>
> Also, please make a separate patch, we need to be really confident that
> changing the discovery code around won't introduce any regression, and
> if it does, we'll want it separate from the postcopy+multifd
> enablement. It's ok if you have the patch assume that multifd+postcopy
> will happen later in the series.

* TBH, I think we have complicated this whole thing with multiple
channel types, their inconsistent behaviour and dependence on the
_order_ in which connections are made. Do we really need channel
types? Could we consider rationalising them?

> > +        if (qio_channel_has_feature(ioc, 
> > QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
> >          /*
> >           * With multiple channels, it is possible that we receive channels
> >           * out of order on destination side, causing incorrect mapping of
> > @@ -1006,42 +999,58 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
> > Error **errp)
> >           * tls handshake while initializing main channel so with tls this
> >           * issue is not possible.
> >           */
>
> This comment block needs to be indented properly.
>
> > -        ret = migration_channel_read_peek(ioc, (void *)&channel_magic,
> > -                                          sizeof(channel_magic), errp);
> > +            ret = migration_channel_read_peek(ioc, (void *)&channel_magic,
> > +                                              sizeof(channel_magic), errp);
> > +            if (ret != 0) {
> > +                return;
> > +            }
> >
> > -        if (ret != 0) {
> > -            return;
> > +            if (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC)) {
> > +                channel = CH_MAIN;
> > +            } else if (channel_magic == cpu_to_be32(MULTIFD_MAGIC)) {
> > +                channel = CH_MULTIFD;
> > +            } else if (!mis->from_src_file
> > +                        && mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) 
> > {
>
> The usual style is to keep the && on the previous line.
>
> > +                /* reconnect default channel for postcopy recovery */
>
> s/default/main/

* Okay, will fix these.


> > +                channel = CH_MAIN;
> > +            } else {
> > +                error_report("%s: unknown channel magic: %u",
> > +                                __func__, channel_magic);
> > +                return;
>
> This needs to set errp instead of reporting.

* Okay.

> > +            }
> > +        } else if (mis->from_src_file
> > +                && (!strcmp(ioc->name, "migration-tls-incoming")
> > +                || !strcmp(ioc->name, "migration-file-incoming"))) {
> > +            channel = CH_MULTIFD;
>
> This is quite misleading. These channels are used without multifd as
> well. For instance, file-backed fd migration goes past this because
> !mis->from_src_file but it still uses the file channel.
>
> I agree with Peter that checking for channel names is not ideal. I don't
> see an alternative at the moment (hiding the assumption is of course not
> a fix). Maybe check migrate_multifd() here and explain in a comment that
> at the moment, the non-peekable channels happen to be used with multifd
> only.

* The first paragraph says these channels are used without
migrate_multifd(); And the second paragraph says they are used with
migrate_multifd() only....??
===
} else {
        /* Multiple connections */
        assert(migration_needs_multiple_sockets());
        if (migrate_multifd()) {
            multifd_recv_new_channel(ioc, &local_err);
        } else {
            assert(migrate_postcopy_preempt());
            f = qemu_file_new_input(ioc);
            postcopy_preempt_new_channel(mis, f);
        }
===
* IIUC in the current code, when migrate_multifd() is true, these
channels get processed via - multifd_recv_new_channel(). And when
migrate_multifd() is false, it falls to postcopy_preempt_new_channel()
part. Not sure if/how that works really. It is possible that currently
these (tls/file) channels are used only with migrate_multifd() enabled
and so are processed with multifd_recv_new_channel() function. The
current patch handles them the same way.

* About checking channel names, in the non-peekable category above,
how do we differentiate between 'TLS', 'File' and 'Postcopy' channels?
Without magic values, we don't have much choice really.  And seeing
those names in the code also tells the reader that 'TLS' and 'File'
channels are processed as CH_MULTIFD via - multifd_recv_new_channel().

> No else clause here and in the rest of the patch makes this is as opaque
> as the previous version, IMO. We need to define what's supposed to
> happen whenever the conditionals don't match. Is it an error,
> g_assert_not_reached(), a fallthrough? Better to set CH_MAIN explicitly
> wherenever that's the case.

* I'd say let's treat unmatched conditions as an error and return.

> > -        default_channel = (channel_magic == 
> > cpu_to_be32(QEMU_VM_FILE_MAGIC));
> > -    } else {
> > -        default_channel = !mis->from_src_file;
> > +    } else if (mis->from_src_file) {
> > +        channel = CH_POSTCOPY;
> >      }
>
> Same here.

* Here final else means the main channel (mis->from_src_file) is not
established, so it defaults to CH_MAIN.

>
> You could check CH_POSTCOPY first in the block above this one and return
> early.
>

* Okay.

* My request is the same - let's think about having consistent channel
behaviour. The restructuring and overhauling of the channel processing
part could be done separately, outside the current scope of enabling
multifd+postcopy together with this series. Let's not try to fix
everything in one go, in one series.


Thank you.
---
  - Prasad


Reply via email to