On 10/8/2024 3:11 PM, Fabiano Rosas wrote:
Peter Xu <pet...@redhat.com> writes:
On Tue, Oct 08, 2024 at 03:28:30PM -0300, Fabiano Rosas wrote:
+ /* Close cpr socket to tell source that we are listening */
+ cpr_state_close();
Would it be possible to use some explicit reply message to mark this?
In theory yes, but I fear that using a return channel with message parsing and
dispatch adds more code than it is worth.
I think this approach is fine for now, but I wonder whether we could
reuse the current return path (RP) by starting it earlier and take
benefit from it already having the message passing infrastructure in
place. I'm actually looking ahead to the migration handshake thread[1],
which could be thought to have some similarity with the early cpr
channel. So having a generic channel in place early on to handle
handshake, CPR, RP, etc. could be a good idea.
The current design relies on CPR stage happens before device realize()s, so
I assume migration channel (including RP) isn't easily applicable at as
early as this stage.
Well, what is the dependency for the RP? If we can send CPR state, we
can send QEMU_VM_COMMAND, no?
The CPR state channel is (and must be) used before migration_object_init,
and before the normal migration channel is opened. Thus we cannot use the
normal return path.
- Steve
However I think dest qemu can directly write back to the cpr_uri channel
instead if we want and then follow a protocol simple enough (even though
it'll be separate from the migration stream protocol).
What worries me more (besides using HUP as of now..) is cpr_state_save() is
currently synchronous and can block the main iothread. It means if cpr
destination is not properly setup, it can hang the main thread (including
e.g. QMP monitor) at qio_channel_socket_connect_sync(). Ideally we
shouldn't block the main thread.
If async-mode can be done, it might be even easier, e.g. if we do
cpr_state_save() in a thread, after qemu_put*() we can directly qemu_get*()
in the same context with the pairing return qemufile.
But maybe we can do it in two steps, merging HUP first. Then when a better
protocol (plus async mode) ready, one can boost QEMU_CPR_FILE_VERSION.
I'll see how Steve wants to address it.
I agree HUP is fine at the moment.
Anyway, I'm probing on this a bit so I can start drafting something. I
got surprised that we don't even have the capability bits in the stream
in a useful way (currently, configuration_validate_capabilities() does
kind of nothing).
1- https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake
Happy to know this. I was thinking whether I should work on this even
earlier, so if you're looking at that it'll be great.
As of half an hour ago =) We could put a feature branch up and work
together, if you have more concrete thoughts on how this would look like
let me know.
The major pain to me is the channel establishment part where we now have
all kinds of channels, so we should really fix that sooner (e.g., we hope
to enable multifd + postcopy very soon, that requires multifd and preempt
channels appear in the same time). It was reasonable the vfio/multifd
series tried to fix it.