On Tue, Oct 08, 2024 at 03:28:30PM -0300, Fabiano Rosas wrote:
> >>> +    /* Close cpr socket to tell source that we are listening */
> >>> +    cpr_state_close();
> >> 
> >> Would it be possible to use some explicit reply message to mark this?  
> >
> > In theory yes, but I fear that using a return channel with message parsing 
> > and
> > dispatch adds more code than it is worth.
> 
> I think this approach is fine for now, but I wonder whether we could
> reuse the current return path (RP) by starting it earlier and take
> benefit from it already having the message passing infrastructure in
> place. I'm actually looking ahead to the migration handshake thread[1],
> which could be thought to have some similarity with the early cpr
> channel. So having a generic channel in place early on to handle
> handshake, CPR, RP, etc. could be a good idea.

The current design relies on CPR stage happens before device realize()s, so
I assume migration channel (including RP) isn't easily applicable at as
early as this stage.

However I think dest qemu can directly write back to the cpr_uri channel
instead if we want and then follow a protocol simple enough (even though
it'll be separate from the migration stream protocol).

What worries me more (besides using HUP as of now..) is cpr_state_save() is
currently synchronous and can block the main iothread.  It means if cpr
destination is not properly setup, it can hang the main thread (including
e.g. QMP monitor) at qio_channel_socket_connect_sync().  Ideally we
shouldn't block the main thread.

If async-mode can be done, it might be even easier, e.g. if we do
cpr_state_save() in a thread, after qemu_put*() we can directly qemu_get*()
in the same context with the pairing return qemufile.

But maybe we can do it in two steps, merging HUP first.  Then when a better
protocol (plus async mode) ready, one can boost QEMU_CPR_FILE_VERSION.
I'll see how Steve wants to address it.

> 
> Anyway, I'm probing on this a bit so I can start drafting something. I
> got surprised that we don't even have the capability bits in the stream
> in a useful way (currently, configuration_validate_capabilities() does
> kind of nothing).
> 
> 1- https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake

Happy to know this. I was thinking whether I should work on this even
earlier, so if you're looking at that it'll be great.

The major pain to me is the channel establishment part where we now have
all kinds of channels, so we should really fix that sooner (e.g., we hope
to enable multifd + postcopy very soon, that requires multifd and preempt
channels appear in the same time).  It was reasonable the vfio/multifd
series tried to fix it.

-- 
Peter Xu


Reply via email to