Hello Fabiano, On Thu, 20 Feb 2025 at 19:06, Fabiano Rosas <faro...@suse.de> wrote: > This is more or less the handshake idea. Or at least it could be > included in that work. > > I have parked the handshake idea for now because I'm not seeing an > immediate need for it and there are more pressing issues to be dealt > with first such as bugs and coordinating the new features (and their > possible outcomings) that IMO need to be looked at first.
* I see, okay. > I'm not opposed to that idea. When I started working with migration I > had the impression that was the direction and that we could put every > workload in a pool of multifd threads. Now, knowing the code better, I'm > not sure that's feasible. Specially the dependence on a "main" channel > seems difficult to do away with. It's also somewhat convenient to have a > maint thread. But we could still attempt to group extra threads, such as > what we're doing with the new thread pool in the device state series. At > least thread management could be done entirely in a separate pool, main > channel and all. > * True. To extend the two QEMUs working in tandem OR the handshake idea further with the 'main' channel, let's say a user invokes command: $ virsh migrate --threads 4 --postcopy --postcopy-after-precopy ... 0) Channel = TCP socket connection between two machines. 1) The 'main' channel is the dedicated _control_ channel; And other channels are dedicated _data_ channels. So with '--threads 4' option, QEMU creates a total of 5 (main + 4) channels. QEMU-A -> 'main' channel -> QEMU-B QEMU-A -> 'data' channel-1 -> QEMU-B QEMU-A -> 'data' channel-2 -> QEMU-B QEMU-A -> 'data' channel-3 -> QEMU-B QEMU-A -> 'data' channel-4 -> QEMU-B * Each channel is used by a thread of its own. 2) All channels are created _before_ the migration starts and stay till the end of the migration. No asynchronous channels popping up during migration, like a 'postcopy' channel now. 3) In the beginning source says 'Let's Precopy' to the destination on the 'main' channel QEMU-A -> main: Let's precopy -> QEMU-B QEMU-A <- main: Okay <- QEMU-B And migration data flows from QEMU-A -> to -> QEMU-B on the 'data' channels. QEMU-A -> 'data' -> -> -> QEMU-B QEMU-A -> 'data' -> -> -> QEMU-B QEMU-A -> 'data' -> -> -> QEMU-B QEMU-A -> 'data' -> -> -> QEMU-B 4) When it's time to switch to Postcopy, source says 'Let's Postcopy' to the destination on the 'main' channel QEMU-A -> main: Let's postcopy -> QEMU-B QEMU-A <- main: Okay <- QEMU-B And migration page requests/data use the same 'data' channels. QEMU-A <- <- 'request/data' -> -> QEMU-B QEMU-A <- <- 'request/data' -> -> QEMU-B QEMU-A <- <- 'request/data' -> -> QEMU-B QEMU-A <- <- 'request/data' -> -> QEMU-B 5) This way: - 'main' channel could be used to co-ordinate actions of two QEMUs. - All data channels may be used during Postcopy too, instead of one channel now. - There may not be race conditions while creating channels. - No differentiation of precopy/multifd/postcopy/preempt etc. channels. (thinking out loud if that sounds workable) Thank you. --- - Prasad