On Tue, Oct 29, 2024 at 10:20:24AM -0300, Fabiano Rosas wrote: > Daniel P. Berrangé <berra...@redhat.com> writes: > > > On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote: > >> > >> The migration QAPI design has always felt rather odd to me, in that we > >> have perfectly good commands "migrate" & "migrate-incoming" that are able > >> to accept an arbitrary list of parameters when invoked. Instead of passing > >> parameters to them though, we instead require apps use the separate > >> migreate-set-parameters/capabiltiies commands many times over to set > >> global variables which the later 'migrate' command then uses. > >> > >> The reason for this is essentially a historical mistake - we copied the > >> way we did it from HMP, which was this way because HMP was bad at > >> supporting > >> arbitrary customizable paramters to commands. I wish we hadn't copied this > >> design over to QMP. > >> > >> To bring it back on topic, we need QMP on the dest to set parameters, > >> because -incoming was limited to only take the URI. > >> > >> If the "migrate-incoming" command accepted all parameters directly, > >> then we could use QAPI visitor to usupport a "-incoming ..." command > >> that took an arbitrary JSON document and turned it into a call to > >> "migrate-incoming". > >> > >> With that we would never need QMP on the target for cpr-exec, avoiding > >> this ordering poblem you're facing....assuming we put processing of > >> -incoming at the right point in the code flow > >> > >> Can we fix this design and expose the full configurability on the > >> CLI using QAPI schema & inline JSON, like we do for other QAPI-ified > >> CLI args. > >> > >> It seems entirely practical to me to add parameters to 'migrate-incoming' > >> in a backwards compatible manner and deprecate set-parameters/capabilities > > > > Incidentally, if we were going to evolve the migration API at all, then > > it probably ought to start making use of the async job infrastructure > > we have available. This is use by block jobs, and by the internal snapshot > > I'm all for standardization on core infrastructure, but unfortunately > putting migration in a coroutine would open a can of worms. In fact, > we've been discussing about moving the incoming side out of coroutines > for a while.
Yes, I share the same concern. I think migration decided to go already with as much thread model as possible that it can. And I paused that attempt to move load() into a thread, as of now, finding it's still non-trivial to work out the major issue: after dest load became a thread, it means it can't take BQL for too long otherwise it blocks the monitor. It means we can't take bql for _any_ IO operation because it can stuck at any IO waiting for the iochannel / NIC, aka, any qemu_get*() API invoked. Meanwhile we still trivially need the bql from time to time, either in pre_load() / post_load(), or some of VMStateInfo->get(). But still I can't blindly take them, as in any VMStateInfo->get(), it can invoke qemu_get*() itself. So I am not sure whether what we can get from that is worthwhile yet on the effort to make it work.. > > > commands, and was intended to be used for any case where we had a long > > running operation triggered by a command. Migration was a poster-child > > example of what its intended for, but was left alone when we first > > introduced the job APIs. > > > > The 'job-cancel' API would obsolete 'migrate-cancel'. > > > > The other interestnig thing is that the job framework creates a well > > defined lifecycle for a job, that allows querying information about > > the job after completeion, but without QEMU having to keep that info > > around forever. ie once a job has finished, an app can query info > > about completion, and when it no longer needs that info, it can > > call 'job-dismiss' to tell QEMU to discard it. > > > > If "MigrationState" were associated a job, then it would thus have a > > clear 'creation' and 'deletion' time. It'll face the same challenge here on whether we can join() in the main thread. IOW, job-cancel can take time which can also potentially block qemu from quitting fast. -- Peter Xu