On 10/9/2024 4:36 PM, Peter Xu wrote:
On Wed, Oct 09, 2024 at 04:09:45PM -0400, Steven Sistare wrote:
On 10/9/2024 3:06 PM, Peter Xu wrote:
On Wed, Oct 09, 2024 at 02:43:44PM -0400, Steven Sistare wrote:
On 10/8/2024 3:48 PM, Peter Xu wrote:
On Tue, Oct 08, 2024 at 04:11:38PM -0300, Fabiano Rosas wrote:
As of half an hour ago =) We could put a feature branch up and work
together, if you have more concrete thoughts on how this would look like
let me know.
[I'll hijack this thread with one more email, as this is not cpr-relevant]
I think I listed all the things I can think of in the wiki, so please go
ahead.
One trivial suggestion is we can start from the very simple, which is the
handshake itself, with a self-bootstrap protocol, probably feature-bit
based or whatever you prefer. Then we set bit 0 saying "this QEMU knows
how to handshake".
Comparing to the rest requirement, IMHO we can make the channel
establishment the 1st feature, then it's already good for merging, having
feature bit 1 saying "this qemu understands named channel establishment".
Then we add new feature bits on top of the handshake feature, by adding
more feature bits. Both QEMUs should first handshake on the feature bits
they support and enable only the subset that all support.
Or instead of bit, feature strings, etc. would all work which you
prefer. Just to say we don't need to impl all the ideas there, as some of
them might take more time (e.g. device tree check), and that list is
probably not complete anyway.
While writing a qtest for cpr-transfer, I discovered a problem that could be
solved with an early migration handshake, prior to cpr_save_state /
cpr_load_state.
There is currently no way to set migration caps on dest qemu before starting
cpr-transfer, because dest qemu blocks in cpr_state_load before creating any
devices or monitors. It is unblocked after the user sends the migrate command
to source qemu, but then the migration starts and it is too late to set
migration
capabilities or parameters on the dest.
Are you OK with that restriction (for now, until a handshake is implemented)?
If not, I have a problem.
I can hack the qtest to make it work with the restriction.
Hmm, the test case is one thing, but if it's a problem, then.. how in real
life one could set migration capabilities on dest qemu for cpr-transfer?
You will allow it via the migration handshake!
But right now, one can enable capabilities by adding -global migration.xxx=yyy
on the target command line.
Those are for debugging only, so we shouldn't suggest them to be used in
production.. at least not the plan.
Yeah, handshake would make it work. But it's not yet there.. :(
Now a similar question, and also what I overlooked previously, is how
cpr-transfer should support "-incoming defer". We need that because that's
what Libvirt uses.. with an upcoming migrate_incoming QMP command.
Defer works. Start dest qemu, issue the migrate command to source qemu.
Dest qemu finishes cpr_load_state and enters the main loop, listening for
montitor commands.
Ahh yes, the HUP works with this case too, that's OK.
Defer works, but it is backwards. I believe the managers would typically send
monitor configuration commands to the dest first, then send the migrate command
to the source. Backwards is weird.
My new proposal addresses this.
What's your thoughts in the other email I wrote? That'll make QMP
available in general on dest, if I read it right. But yeah I think this
issue is not a blocker now at least, so I'm just curious whether that's
still useful.
We may still want to understand one question I raised elsewhere on whether
cpr state save/load must be done during vm stopped. If so, then it means
Libvirt will only go with "defer", and QMP set-capabilities might be
accounted as downtime there which can be unfortunate.. Basically, it means
if we can still drop patch 4 completely (while the vhost notifiers can
exist in the future, but hopefully not dependent on patch 4).
vhost requires us to stop the vm early:
qmp_migrate
stop vm
migration_call_notifiers MIG_EVENT_PRECOPY_CPR_SETUP
vhost_cpr_notifier
vhost_reset_device - must be after stop vm
- and before new qemu inits devices
cpr_state_save
unblocks new qemu which inits devices and calls vhost_set_owner
Thus config commands must be sent to the target during the guest pause interval
:(
My new proposal addresses this.
- Steve