Re: [PATCH V2 13/13] migration: cpr-transfer mode

Peter Xu Wed, 09 Oct 2024 13:37:22 -0700

On Wed, Oct 09, 2024 at 04:09:45PM -0400, Steven Sistare wrote:
> On 10/9/2024 3:06 PM, Peter Xu wrote:
> > On Wed, Oct 09, 2024 at 02:43:44PM -0400, Steven Sistare wrote:
> > > On 10/8/2024 3:48 PM, Peter Xu wrote:
> > > > On Tue, Oct 08, 2024 at 04:11:38PM -0300, Fabiano Rosas wrote:
> > > > > As of half an hour ago =) We could put a feature branch up and work
> > > > > together, if you have more concrete thoughts on how this would look 
> > > > > like
> > > > > let me know.
> > > > 
> > > > [I'll hijack this thread with one more email, as this is not 
> > > > cpr-relevant]
> > > > 
> > > > I think I listed all the things I can think of in the wiki, so please go
> > > > ahead.
> > > > 
> > > > One trivial suggestion is we can start from the very simple, which is 
> > > > the
> > > > handshake itself, with a self-bootstrap protocol, probably feature-bit
> > > > based or whatever you prefer.  Then we set bit 0 saying "this QEMU knows
> > > > how to handshake".
> > > > 
> > > > Comparing to the rest requirement, IMHO we can make the channel
> > > > establishment the 1st feature, then it's already good for merging, 
> > > > having
> > > > feature bit 1 saying "this qemu understands named channel 
> > > > establishment".
> > > > 
> > > > Then we add new feature bits on top of the handshake feature, by adding
> > > > more feature bits.  Both QEMUs should first handshake on the feature 
> > > > bits
> > > > they support and enable only the subset that all support.
> > > > 
> > > > Or instead of bit, feature strings, etc. would all work which you
> > > > prefer. Just to say we don't need to impl all the ideas there, as some 
> > > > of
> > > > them might take more time (e.g. device tree check), and that list is
> > > > probably not complete anyway.
> > > 
> > > While writing a qtest for cpr-transfer, I discovered a problem that could 
> > > be
> > > solved with an early migration handshake, prior to cpr_save_state / 
> > > cpr_load_state.
> > > 
> > > There is currently no way to set migration caps on dest qemu before 
> > > starting
> > > cpr-transfer, because dest qemu blocks in cpr_state_load before creating 
> > > any
> > > devices or monitors. It is unblocked after the user sends the migrate 
> > > command
> > > to source qemu, but then the migration starts and it is too late to set 
> > > migration
> > > capabilities or parameters on the dest.
> > > 
> > > Are you OK with that restriction (for now, until a handshake is 
> > > implemented)?
> > > If not, I have a problem.
> > > 
> > > I can hack the qtest to make it work with the restriction.
> > 
> > Hmm, the test case is one thing, but if it's a problem, then.. how in real
> > life one could set migration capabilities on dest qemu for cpr-transfer?
> 
> You will allow it via the migration handshake!
> But right now, one can enable capabilities by adding -global migration.xxx=yyy
> on the target command line.


Those are for debugging only, so we shouldn't suggest them to be used in
production.. at least not the plan.

Yeah, handshake would make it work.  But it's not yet there.. :(

> 
> > Now a similar question, and also what I overlooked previously, is how
> > cpr-transfer should support "-incoming defer".  We need that because that's
> > what Libvirt uses.. with an upcoming migrate_incoming QMP command.
> 
> Defer works.  Start dest qemu, issue the migrate command to source qemu.
> Dest qemu finishes cpr_load_state and enters the main loop, listening for
> montitor commands.

Ahh yes, the HUP works with this case too, that's OK.

What's your thoughts in the other email I wrote?  That'll make QMP
available in general on dest, if I read it right.  But yeah I think this
issue is not a blocker now at least, so I'm just curious whether that's
still useful.

We may still want to understand one question I raised elsewhere on whether
cpr state save/load must be done during vm stopped.  If so, then it means
Libvirt will only go with "defer", and QMP set-capabilities might be
accounted as downtime there which can be unfortunate.. Basically, it means
if we can still drop patch 4 completely (while the vhost notifiers can
exist in the future, but hopefully not dependent on patch 4).

-- 
Peter Xu

Re: [PATCH V2 13/13] migration: cpr-transfer mode

Reply via email to