On Wed, Oct 09, 2024 at 03:06:53PM -0400, Peter Xu wrote:
> On Wed, Oct 09, 2024 at 02:43:44PM -0400, Steven Sistare wrote:
> > On 10/8/2024 3:48 PM, Peter Xu wrote:
> > > On Tue, Oct 08, 2024 at 04:11:38PM -0300, Fabiano Rosas wrote:
> > > > As of half an hour ago =) We could put a feature branch up and work
> > > > together, if you have more concrete thoughts on how this would look like
> > > > let me know.
> > > 
> > > [I'll hijack this thread with one more email, as this is not cpr-relevant]
> > > 
> > > I think I listed all the things I can think of in the wiki, so please go
> > > ahead.
> > > 
> > > One trivial suggestion is we can start from the very simple, which is the
> > > handshake itself, with a self-bootstrap protocol, probably feature-bit
> > > based or whatever you prefer.  Then we set bit 0 saying "this QEMU knows
> > > how to handshake".
> > > 
> > > Comparing to the rest requirement, IMHO we can make the channel
> > > establishment the 1st feature, then it's already good for merging, having
> > > feature bit 1 saying "this qemu understands named channel establishment".
> > > 
> > > Then we add new feature bits on top of the handshake feature, by adding
> > > more feature bits.  Both QEMUs should first handshake on the feature bits
> > > they support and enable only the subset that all support.
> > > 
> > > Or instead of bit, feature strings, etc. would all work which you
> > > prefer. Just to say we don't need to impl all the ideas there, as some of
> > > them might take more time (e.g. device tree check), and that list is
> > > probably not complete anyway.
> > 
> > While writing a qtest for cpr-transfer, I discovered a problem that could be
> > solved with an early migration handshake, prior to cpr_save_state / 
> > cpr_load_state.
> > 
> > There is currently no way to set migration caps on dest qemu before starting
> > cpr-transfer, because dest qemu blocks in cpr_state_load before creating any
> > devices or monitors. It is unblocked after the user sends the migrate 
> > command
> > to source qemu, but then the migration starts and it is too late to set 
> > migration
> > capabilities or parameters on the dest.
> > 
> > Are you OK with that restriction (for now, until a handshake is 
> > implemented)?
> > If not, I have a problem.
> > 
> > I can hack the qtest to make it work with the restriction.
> 
> Hmm, the test case is one thing, but if it's a problem, then.. how in real
> life one could set migration capabilities on dest qemu for cpr-transfer?
> 
> Now a similar question, and also what I overlooked previously, is how
> cpr-transfer should support "-incoming defer".  We need that because that's
> what Libvirt uses.. with an upcoming migrate_incoming QMP command.

Just to share some more thoughts below..

So fundamentally the question is whether there's some way cpr can have a
predictable window on dest qemu that we know QMP is ready, but before
incoming migration starts.

With current design, incoming side will sequentially do: (1) cpr-uri
load(), (2) initialize rest of QEMU (migration, qmp, devices, etc.), (3)
listen port ready, then (4) close(), aka, HUP.  Looks like steps 1-4 will
have no way to control when kicked off, so after cpr-uri save() data dump
they'll happen in one shot.

It might make sense because we assumed load() of cpr-uri is during the
blackout window, and enlarge that is probably not good.

But.. why do we keep cpr_state_save/load() in the blackout window?  AFAIU
they're mostly the fds sharing so they can happen with VM still running on
src, right?

I still remember the vhost/tap issue you mentioned, but I wonder whether
that'll ever change the vhost/tap fd at all if we forbid any device change
like what we do with normal migrations. IOW, I wonder whether we can still
do the cpr_state_save/load() always during VM running (but it should still
be during an ACTIVE migration, IOW, device hotplug and stuff should be
forbidden, just like a live precopy phase).

Iff that works, then maybe there's a way out: we can make cpr-transfer two
steps:

  - DST: start QEMU dest the same, with -cpr-uri XXX, but now let's assume
    it's with -incoming defer just to give an example, and no migration
    capabilities applied yet.

  - SRC: send 'migrate' QMP command, qemu should see that cpr-transfer is
    enabled, so it triggers sending cpr states to destination only.  It
    doesn't run the rest migration logic.

    During this stage src VM will always be running, we need to make sure
    migration state machine start running (perhaps NONE->SETUP_CPR) so
    device plug/unplug will be forbidden like what happens with generic
    precopy, so as to stablize fds.  Just need to make sure
    migration_is_running() returns true.

  - DST: receives all cpr states.  When complete, it keeps running, no HUP
    is needed this time, because it'll wait for another "migrate_incoming".

    In the case of "-incoming unix:XXX" in qemu cmdline, it'll directly go
    into the listen code and wait, but still we don't need the HUP because
    we're not in blackout window, and src won't connect automatically but
    requires a command later from mgmt (see below).

  - DST: the mgmt can send whatever QMP command to dest now, including
    setup incoming port, setup migration capabilities/parameters if needed.
    Src is still running, so it can be slow.

  - SRC: do the real migration with another "migrate resume=true" QMP
    command (I simply reused postcopy's resume flag here).  This time src
    qemu should notice this is a continuation of cpr-transfer migration,
    then it moves that on (SETUP_CPR->ACTIVE), migrate RAM/device/whatever
    is left.  Same to generic migration, until COMPLETED.

Not sure whether it'll work.  We'll need to still properly handle things
like migrate_cancel, etc, when triggered during SETUP_CPR state, but
hopefully not complicated to do..

-- 
Peter Xu


Reply via email to