On Wed, Oct 09, 2024 at 03:06:53PM -0400, Peter Xu wrote: > On Wed, Oct 09, 2024 at 02:43:44PM -0400, Steven Sistare wrote: > > On 10/8/2024 3:48 PM, Peter Xu wrote: > > > On Tue, Oct 08, 2024 at 04:11:38PM -0300, Fabiano Rosas wrote: > > > > As of half an hour ago =) We could put a feature branch up and work > > > > together, if you have more concrete thoughts on how this would look like > > > > let me know. > > > > > > [I'll hijack this thread with one more email, as this is not cpr-relevant] > > > > > > I think I listed all the things I can think of in the wiki, so please go > > > ahead. > > > > > > One trivial suggestion is we can start from the very simple, which is the > > > handshake itself, with a self-bootstrap protocol, probably feature-bit > > > based or whatever you prefer. Then we set bit 0 saying "this QEMU knows > > > how to handshake". > > > > > > Comparing to the rest requirement, IMHO we can make the channel > > > establishment the 1st feature, then it's already good for merging, having > > > feature bit 1 saying "this qemu understands named channel establishment". > > > > > > Then we add new feature bits on top of the handshake feature, by adding > > > more feature bits. Both QEMUs should first handshake on the feature bits > > > they support and enable only the subset that all support. > > > > > > Or instead of bit, feature strings, etc. would all work which you > > > prefer. Just to say we don't need to impl all the ideas there, as some of > > > them might take more time (e.g. device tree check), and that list is > > > probably not complete anyway. > > > > While writing a qtest for cpr-transfer, I discovered a problem that could be > > solved with an early migration handshake, prior to cpr_save_state / > > cpr_load_state. > > > > There is currently no way to set migration caps on dest qemu before starting > > cpr-transfer, because dest qemu blocks in cpr_state_load before creating any > > devices or monitors. It is unblocked after the user sends the migrate > > command > > to source qemu, but then the migration starts and it is too late to set > > migration > > capabilities or parameters on the dest. > > > > Are you OK with that restriction (for now, until a handshake is > > implemented)? > > If not, I have a problem. > > > > I can hack the qtest to make it work with the restriction. > > Hmm, the test case is one thing, but if it's a problem, then.. how in real > life one could set migration capabilities on dest qemu for cpr-transfer? > > Now a similar question, and also what I overlooked previously, is how > cpr-transfer should support "-incoming defer". We need that because that's > what Libvirt uses.. with an upcoming migrate_incoming QMP command.
Just to share some more thoughts below.. So fundamentally the question is whether there's some way cpr can have a predictable window on dest qemu that we know QMP is ready, but before incoming migration starts. With current design, incoming side will sequentially do: (1) cpr-uri load(), (2) initialize rest of QEMU (migration, qmp, devices, etc.), (3) listen port ready, then (4) close(), aka, HUP. Looks like steps 1-4 will have no way to control when kicked off, so after cpr-uri save() data dump they'll happen in one shot. It might make sense because we assumed load() of cpr-uri is during the blackout window, and enlarge that is probably not good. But.. why do we keep cpr_state_save/load() in the blackout window? AFAIU they're mostly the fds sharing so they can happen with VM still running on src, right? I still remember the vhost/tap issue you mentioned, but I wonder whether that'll ever change the vhost/tap fd at all if we forbid any device change like what we do with normal migrations. IOW, I wonder whether we can still do the cpr_state_save/load() always during VM running (but it should still be during an ACTIVE migration, IOW, device hotplug and stuff should be forbidden, just like a live precopy phase). Iff that works, then maybe there's a way out: we can make cpr-transfer two steps: - DST: start QEMU dest the same, with -cpr-uri XXX, but now let's assume it's with -incoming defer just to give an example, and no migration capabilities applied yet. - SRC: send 'migrate' QMP command, qemu should see that cpr-transfer is enabled, so it triggers sending cpr states to destination only. It doesn't run the rest migration logic. During this stage src VM will always be running, we need to make sure migration state machine start running (perhaps NONE->SETUP_CPR) so device plug/unplug will be forbidden like what happens with generic precopy, so as to stablize fds. Just need to make sure migration_is_running() returns true. - DST: receives all cpr states. When complete, it keeps running, no HUP is needed this time, because it'll wait for another "migrate_incoming". In the case of "-incoming unix:XXX" in qemu cmdline, it'll directly go into the listen code and wait, but still we don't need the HUP because we're not in blackout window, and src won't connect automatically but requires a command later from mgmt (see below). - DST: the mgmt can send whatever QMP command to dest now, including setup incoming port, setup migration capabilities/parameters if needed. Src is still running, so it can be slow. - SRC: do the real migration with another "migrate resume=true" QMP command (I simply reused postcopy's resume flag here). This time src qemu should notice this is a continuation of cpr-transfer migration, then it moves that on (SETUP_CPR->ACTIVE), migrate RAM/device/whatever is left. Same to generic migration, until COMPLETED. Not sure whether it'll work. We'll need to still properly handle things like migrate_cancel, etc, when triggered during SETUP_CPR state, but hopefully not complicated to do.. -- Peter Xu