Hello Fabiano,

On Thu, 20 Feb 2025 at 19:06, Fabiano Rosas <faro...@suse.de> wrote:
> This is more or less the handshake idea. Or at least it could be
> included in that work.
>
> I have parked the handshake idea for now because I'm not seeing an
> immediate need for it and there are more pressing issues to be dealt
> with first such as bugs and coordinating the new features (and their
> possible outcomings) that IMO need to be looked at first.

* I see, okay.

> I'm not opposed to that idea. When I started working with migration I
> had the impression that was the direction and that we could put every
> workload in a pool of multifd threads. Now, knowing the code better, I'm
> not sure that's feasible. Specially the dependence on a "main" channel
> seems difficult to do away with. It's also somewhat convenient to have a
> maint thread. But we could still attempt to group extra threads, such as
> what we're doing with the new thread pool in the device state series. At
> least thread management could be done entirely in a separate pool, main
> channel and all.
>

* True. To extend the two QEMUs working in tandem OR the handshake
idea further with the 'main' channel, let's say a user invokes
command:

$ virsh migrate --threads 4 --postcopy --postcopy-after-precopy ...

0) Channel = TCP socket connection between two machines.

1) The 'main' channel is the dedicated _control_ channel; And other
channels are dedicated _data_ channels. So with '--threads 4' option,
QEMU creates a total of 5 (main + 4) channels.

        QEMU-A  -> 'main' channel    -> QEMU-B
        QEMU-A ->  'data' channel-1 -> QEMU-B
        QEMU-A ->  'data' channel-2 -> QEMU-B
        QEMU-A ->  'data' channel-3 -> QEMU-B
        QEMU-A ->  'data' channel-4 -> QEMU-B

    * Each channel is used by a thread of its own.

2) All channels are created _before_ the migration starts and stay
till the end of the migration. No asynchronous channels popping up
during migration, like a 'postcopy' channel now.

3) In the beginning source says 'Let's Precopy' to the destination on
the 'main' channel

         QEMU-A  -> main: Let's precopy  -> QEMU-B
         QEMU-A  <- main: Okay              <- QEMU-B

    And migration data flows from QEMU-A  -> to -> QEMU-B  on the
'data' channels.

        QEMU-A ->  'data' -> -> -> QEMU-B
        QEMU-A ->  'data' -> -> -> QEMU-B
        QEMU-A ->  'data' -> -> -> QEMU-B
        QEMU-A ->  'data' -> -> -> QEMU-B

4) When it's time to switch to Postcopy,  source says 'Let's Postcopy'
to the destination on the 'main' channel

        QEMU-A  -> main: Let's postcopy  -> QEMU-B
        QEMU-A  <- main: Okay                <- QEMU-B

    And migration page requests/data use the same 'data' channels.

        QEMU-A <- <- 'request/data'  -> -> QEMU-B
        QEMU-A <- <- 'request/data'  -> -> QEMU-B
        QEMU-A <- <- 'request/data'  -> -> QEMU-B
        QEMU-A <- <- 'request/data'  -> -> QEMU-B

5) This way:
     - 'main' channel could be used to co-ordinate actions of two QEMUs.
     - All data channels may be used during Postcopy too, instead of
one channel now.
     - There may not be race conditions while creating channels.
     - No differentiation of precopy/multifd/postcopy/preempt etc. channels.

(thinking out loud if that sounds workable)

Thank you.
---
  - Prasad


Reply via email to