Re: [Qemu-devel] [PATCH v2 00/25] qmp: add async command type

Markus Armbruster Thu, 04 May 2017 12:01:52 -0700

I figure everyone interested has had his say.  Let me try to summarize
and draw my conclusions.

Commands + events can replace asynchronous commands (they did).

Asynchronous commands cannot replace events, because "interesting" side
effects of commands need to be broadcast.  Doesn't mean asynchronous
commands can't make sense for us.  Does mean that we need to address any
serious flaws in the existing commands + event mechanism even if we
adopt asynchronous commands.  The cover letter lists multiple flaws,
which I'll discuss inline.

If an operation satisfies certain conditions, then implementing it as
asynchronous command provides certain benefits.  If the operation

(C1) does not need to broadcast state changes, including start and end
     of operation, and

(C2) even the initiating client needs no state change notifications
     except for end of operation, and

(C3) even that isn't needed when the client loses the connection and
     reconnects, and

(C4) the operation doesn't have to be examined or controlled while it
     runs,

then making it an asynchronous command

(B1) saves you the trouble of defining an event together with its link
     to the command, and

(B2) saves you the trouble of sending the event, and

(B3) saves a bit of QMP traffic, namely the event broadcast.

Letting clients opt into the asynchronousness as this series does
provides an additional benefit:

(B4) when you find one of your synchronous commands is at times taking
     too long, you can compatibly evolve it.

We can remove (C3) and (C4) by providing suitable ways to examine and
control asynchronous commands while they run.

We could remove (C1) and (C2) by sending events, but that would also
lose pretty much all benefits.

The operation chosen to showcase asynchronous commands is taking
screenshots.  It obviously satisfies (C1).  I guess for (C2) and (C4) we
have to assume that it completes "quickly".  I'm willing to buy that.
(C3) seems pretty dubious to me, however.

The tradeoff to judge is desirability of the benefits vs. cost to get
them.

Desirability depends on how many commands satisfy these conditions, and
how much trouble exactly is saved.

A list of commands that is more complete than just "screendump" would
help with the former.  query- commands that need to gather information
in ways that could block are among the candidates.

On the latter, I figure (B1) dwarves (B2), (B3) is a relatively minor
optimization unless proven otherwise, and (B4) is real, but asynchronous
commands are not the only way to get it; we can equally well evolve
synchronous commands compatibly into synchronous + event.

The cost is interface and implementation complexity.

Interface complexity increases because we add the concept "asynchronous
command" and the concept "synchronous or asynchronous command, client's
choice".  This is my main concern.

Implementation complexity isn't prohibitive according to diffstat, but
it's not neglible, either: with the last six patches peeled off, we get
some 500 additional lines plus some 300 in tests/.  The last six patches
put the new infrastructure to work, which takes another 200.  Ways to
examine and control asynchronous commands while they run would add some
more, but we don't know how much exactly.

Now let me take a step back: we need to crack the "jobs" problem anyway.
Merging this series now would in a way add another special case of the
general case "jobs" before adding the general case.  In other words,
create more stuff to unify.  I'm afraid that's a bad idea, not least
since it's an external interface.  I think we should do "jobs" first.
Once we got at least a prototype, we can

* judge how onerous defining and implementing jobs really is, i.e. have
  a clearer idea of the potential benefits of asynchronous commands, and

* explore asynchronous commands as a special case of "jobs" optimized
  for operations that satisfy the conditions above, i.e. get a clearer
  idea of the *actual* additional implementation complexity.

Bottom line:

1. I still don't want to merge this.

2. I want us to tackle jobs sooner rather than later.

3. Once we got at least a jobs prototype, I'm willing to reconsider
   asynchronous commands implemented as special case of jobs.

One of the most important maintainer duties is saying "no".  It's also
one of the least fun duties.

Marc-André Lureau <marcandre.lur...@redhat.com> writes:

> Hi,
>
> One of initial design goals of QMP was to have "asynchronous command
> completion" (http://wiki.qemu.org/Features/QAPI). Unfortunately, that
> goal was not fully achieved, and some broken bits left were removed
> progressively until commit 65207c59d that removed async command
> support.
>
> Note that qmp events are asynchronous messages, and must be handled
> appropriately by the client: dispatch both reply and events after a
> command is sent for example.
>
> The benefits of async commands that can be trade-off depending on the
> requirements are:
>
> 1) allow the command handler to re-enter the main loop if the command
> cannot be handled synchronously, or if it is long-lasting. This is
> currently not possible and make some bugs such as rhbz#1230527 tricky
> (see below) to solve.  Furthermore, many QMP commands do IO and could
> be considered 'slow' and blocking today.
>
> 2) allow concurrent commands and events. This mainly implies hanlding
> concurrency in qemu and out-of-order replies for the client. As noted
> earlier, a good qmp client already has to handle dispatching of
> received messages (reply and events).
>
> The traditional approach to solving the above in qemu is the following
> scheme:
> -> { "execute": "do-foo" }
> <- { "return": {} }
> <- { "event": "FOO_DONE" }
>
> It has several flaws:

Since asynchronous commands can't replace events outright, all of these
flaws stay even if we adopt asynchronous commands.  Any serious flaws
need fixing in other ways.

> - FOO_DONE event has no semantic link with do-foo in the qapi
>   schema. It is not simple to generalize that pattern when writing
>   qmp clients. It makes documentation and usage harder.

I consider this serious enough to be worth fixing.  We can either adopt
suitable naming and documentation conventions, or have commands declare
the events they can trigger.  I think the latter makes sense only if we
can flag violations somehow.

> - the FOO_DONE event has no clear association with the command that
>   triggered it: commands/events have to come up with additional
>   specific association schemes (ids, path etc)

Existing events that need to be linked certainly have one.  But there is
no single documented convention for such links.  Well worth fixing; I'd
love to have a single convention for linking events to commands.

We could then assert that an event's linked command declares the event.

> - FOO_DONE is broadcasted to all clients, but they may have no way to
>   interprete it or interest in it, or worse it may conflict with their
>   own commands.

An event that can be "mislinked" by clients is a design flaw that needs
fixing.  A convention for linking (fix for the previous item) needs to
avoid this flaw.

> - the arguably useless empty reply return

Useless only if the client doesn't need to know that the (possibly
long-running) job has been started successfully.

> For some cases, it makes sense to use that scheme, or a more complete
> one: to have an "handler" associated with an on-going operation, that
> can be queried, modified, cancelled etc (block jobs etc). Also, some
> operations have a global side-effect, in which case that cmd+event
> scheme is right, as all clients are listening for global events.

I'm leaning more and more towards the idea that *any* long-running
operation implemented as command + event should be made an instance of
the future "job" abstraction, to give us a *uniform* way to examine and
control them all.

> However, for the simple case where a client want to perform a "local
> context" operation (dump, query etc), QAPI can easily do better
> without resorting to this pattern: it can send the reply when the
> operation is done. That would make QAPI schema, usage and
> documentation more obvious.
>
> The following series implements an async solution, by introducing a
> context associated with a command, it can:
> - defer the return
> - return only to the caller (no broadcasted event)
> - return with the 'id' associated with the call (as originally intended)
> - optionnally allow cancellation when the client is gone
> - track on-going qapi command(s) per client
>
> 1) existing qemu commands can be gradually replaced by async:true
> variants when needed, and by carefully reviewing the concurrency
> aspects. The async:true commands marshaller helpers are splitted in
> half, the calling and return functions. The command is called with a
> QmpReturn context, that can return immediately or later, using the
> generated return helper.
>
> 2) allow concurrent commands when 'async' is negotiated. If the client
> doesn't support 'async', then the monitor is suspended during command
> execution (events are still sent). Effectively, async commands behave
> like normal commands from client POV in this case, giving full
> backward compatibility.
>
> The screendump command is converted to an async:true version to solve
> rhbz#1230527. The command shows basic cancellation (this could be
> extended if needed). HMP remains sync, but it would be worth to make
> it possible to call async:true qapi commands.
>
> The last patch cleans up qmp_dispatch usage to have consistant checks
> between qga & qemu, and simplify QmpClient/parser_feed usage.

Re: [Qemu-devel] [PATCH v2 00/25] qmp: add async command type

Reply via email to