Re: [RFC V1 00/14] precreate phase

Peter Xu Tue, 29 Oct 2024 09:13:05 -0700

On Tue, Oct 29, 2024 at 10:20:24AM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berra...@redhat.com> writes:
> 
> > On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote:
> >> 
> >> The migration QAPI design has always felt rather odd to me, in that we
> >> have perfectly good commands "migrate" & "migrate-incoming" that are able
> >> to accept an arbitrary list of parameters when invoked. Instead of passing
> >> parameters to them though, we instead require apps use the separate
> >> migreate-set-parameters/capabiltiies commands many times over to set
> >> global variables which the later 'migrate' command then uses.
> >> 
> >> The reason for this is essentially a historical mistake - we copied the
> >> way we did it from HMP, which was this way because HMP was bad at 
> >> supporting
> >> arbitrary customizable paramters to commands. I wish we hadn't copied this
> >> design over to QMP.
> >> 
> >> To bring it back on topic, we need QMP on the dest to set parameters,
> >> because -incoming  was limited to only take the URI.
> >> 
> >> If the "migrate-incoming" command accepted all parameters directly,
> >> then we could use QAPI visitor to usupport a "-incoming ..." command
> >> that took an arbitrary JSON document and turned it into a call to
> >> "migrate-incoming".
> >> 
> >> With that we would never need QMP on the target for cpr-exec, avoiding
> >> this ordering poblem you're facing....assuming we put processing of
> >> -incoming at the right point in the code flow
> >> 
> >> Can we fix this design and expose the full configurability on the
> >> CLI using QAPI schema & inline JSON, like we do for other QAPI-ified
> >> CLI args.
> >> 
> >> It seems entirely practical to me to add parameters to 'migrate-incoming'
> >> in a backwards compatible manner and deprecate set-parameters/capabilities
> >
> > Incidentally, if we were going to evolve the migration API at all, then
> > it probably ought to start making use of the async job infrastructure
> > we have available. This is use by block jobs, and by the internal snapshot
> 
> I'm all for standardization on core infrastructure, but unfortunately
> putting migration in a coroutine would open a can of worms. In fact,
> we've been discussing about moving the incoming side out of coroutines
> for a while.


Yes, I share the same concern.  I think migration decided to go already
with as much thread model as possible that it can.

And I paused that attempt to move load() into a thread, as of now, finding
it's still non-trivial to work out the major issue: after dest load became
a thread, it means it can't take BQL for too long otherwise it blocks the
monitor.

It means we can't take bql for _any_ IO operation because it can stuck at
any IO waiting for the iochannel / NIC, aka, any qemu_get*() API invoked.

Meanwhile we still trivially need the bql from time to time, either in
pre_load() / post_load(), or some of VMStateInfo->get().  But still I can't
blindly take them, as in any VMStateInfo->get(), it can invoke qemu_get*()
itself.

So I am not sure whether what we can get from that is worthwhile yet on the
effort to make it work..

> 
> > commands, and was intended to be used for any case where we had a long
> > running operation triggered by a command. Migration was a poster-child
> > example of what its intended for, but was left alone when we first
> > introduced the job APIs.
> >
> > The 'job-cancel' API would obsolete 'migrate-cancel'.
> >
> > The other interestnig thing is that the job framework creates a well
> > defined lifecycle for a job, that allows querying information about
> > the job after completeion, but without QEMU having to keep that info
> > around forever. ie once a job has finished, an app can query info
> > about completion, and when it no longer needs that info, it can
> > call 'job-dismiss' to tell QEMU to discard it.
> >
> > If "MigrationState" were associated a job, then it would thus have a
> > clear 'creation' and 'deletion' time.

It'll face the same challenge here on whether we can join() in the main
thread.  IOW, job-cancel can take time which can also potentially block
qemu from quitting fast.

-- 
Peter Xu

Re: [RFC V1 00/14] precreate phase

Reply via email to