Prasad Pandit <ppan...@redhat.com> writes: > Hello, > > On Wed, 19 Feb 2025 at 22:53, Fabiano Rosas <faro...@suse.de> wrote: >> I don't see anything stopping postcopy_start() from being called in the >> source in relation to multifd recv threads being setup in the >> destination. So far it seems possible that the source is opening the >> preempt channel while multifd still hasn't seen all threads. There's >> also pre-7.2 machines which create the postcopy channel early. > > * If we can not predict the sequence/timings of when different types > of connections are initiated and processed, maybe source and > destination QEMUs could work in tandem. ie. before initiating a > connection, source QEMU could send an 'initiate' message saying I'm > initiating 'X' connection. Only when destination QEMU says 'okay', > source QEMU could proceed with actual connection. > > QEMU-A -> Initiate connection type X -> QEMU-B > QEMU-A <- okay <- <- QEMU-B > QEMU-A -> connect type X -> QEMU-B > > (thinking out loud) >
This is more or less the handshake idea. Or at least it could be included in that work. I have parked the handshake idea for now because I'm not seeing an immediate need for it and there are more pressing issues to be dealt with first such as bugs and coordinating the new features (and their possible outcomings) that IMO need to be looked at first. >>>> > * migration_needs_multiple_sockets() >>> Then it should return 'True' when both migrate_multifd() and >>> postcopy_preempt() are enabled. >> Why? > > * I was thinking multiple_sockets is multiple types of sockets: > multifd & postcopy. But it seems here multiple sockets is any type of > multiple sockets. > Yes this means main channel + others. >> I thought you meant the CH_MAIN stuff. So now I don't know what you >> mean. You want to do away with multifd? > > * Yes, CH_DEFAULT -> CH_MAIN was introduced in this series to identify > channels and accordingly call relevant functions. > > * Not to do away with multifd, but more of making it same as the main > channel, ex: virsh migrate --threads <N> N = 1...255. All precopy > threads/connections behave the same. Differentiation of precopy and > postcopy shall still exist, because they operate/work in opposite > directions. > I'm not opposed to that idea. When I started working with migration I had the impression that was the direction and that we could put every workload in a pool of multifd threads. Now, knowing the code better, I'm not sure that's feasible. Specially the dependence on a "main" channel seems difficult to do away with. It's also somewhat convenient to have a maint thread. But we could still attempt to group extra threads, such as what we're doing with the new thread pool in the device state series. At least thread management could be done entirely in a separate pool, main channel and all. >> Continue with this patch and fix the stuff I mentioned. You can ignore >> the first two paragraphs of that reply. >> >> https://lore.kernel.org/r/87y0y4tf5q....@suse.de >> >> I still think we need to test that preempt + multifd scenario, but it >> should be easy to write a test for that once the series is in more of a >> final shape. > > * Okay. > >> We can't add magic values, as we've discussed. > > Okay. > > Thank you. > --- > - Prasad