Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option

Eduardo Habkost Wed, 25 Oct 2017 06:40:13 -0700

On Mon, Oct 23, 2017 at 11:49:13AM +0200, Igor Mammedov wrote:
> On Fri, 20 Oct 2017 12:21:00 -0200
> Eduardo Habkost <ehabk...@redhat.com> wrote:
> 
> > On Fri, Oct 20, 2017 at 12:19:17PM +1100, David Gibson wrote:
> > > On Thu, Oct 19, 2017 at 10:15:48PM -0200, Eduardo Habkost wrote:  
> > > > On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:  
> > > > > On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:  
> > > > > > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:  
> > > > > > > Signed-off-by: Igor Mammedov <imamm...@redhat.com>
> > > > > > > ---
> > > > > > >  include/sysemu/sysemu.h |  1 +
> > > > > > >  qemu-options.hx         | 15 ++++++++++++++
> > > > > > >  qmp.c                   |  5 +++++
> > > > > > >  vl.c                    | 54 
> > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > >  4 files changed, 74 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > > > > > index b213696..3feb94f 100644
> > > > > > > --- a/include/sysemu/sysemu.h
> > > > > > > +++ b/include/sysemu/sysemu.h
> > > > > > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> > > > > > >      QEMU_WAKEUP_REASON_OTHER,
> > > > > > >  } WakeupReason;
> > > > > > >  
> > > > > > > +void qemu_exit_preconfig_request(void);
> > > > > > >  void qemu_system_reset_request(ShutdownCause reason);
> > > > > > >  void qemu_system_suspend_request(void);
> > > > > > >  void qemu_register_suspend_notifier(Notifier *notifier);
> > > > > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > > > > index 39225ae..bd44db8 100644
> > > > > > > --- a/qemu-options.hx
> > > > > > > +++ b/qemu-options.hx
> > > > > > > @@ -3498,6 +3498,21 @@ STEXI
> > > > > > >  Run the emulation in single step mode.
> > > > > > >  ETEXI
> > > > > > >  
> > > > > > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > > > > > +    "-paused [state=]postconf|preconf\n"
> > > > > > > +    "                postconf: pause QEMU after machine is 
> > > > > > > initialized\n"
> > > > > > > +    "                preconf: pause QEMU before machine is 
> > > > > > > initialized\n",
> > > > > > > +    QEMU_ARCH_ALL)  
> > > > > > 
> > > > > > I would like to allow pausing before machine-type is selected, so
> > > > > > management could run query-machines before choosing a
> > > > > > machine-type.  Would that need a third "-pause" mode, or will we
> > > > > > be able to change "preconf" to pause before select_machine() is
> > > > > > called?
> > > > > > 
> > > > > > The same probably applies to other things initialized before
> > > > > > machine_run_board_init() that could be configurable using QMP,
> > > > > > including but not limited to:
> > > > > > * Accelerator configuration
> > > > > > * Registering global properties
> > > > > > * RAM size
> > > > > > * SMP/CPU configuration  
> > > > > 
> > > > > Yeah.. having a bunch of different possible pause stages to select
> > > > > doesn't sound great.  
> > > > 
> > > > I agree.  The number of externally visible pause states should be
> > > > as small as possible.
> > > > 
> > > >   
> > > > >                       Could we avoid this by instead changing -S to
> > > > > pause at the earliest possible spot, but having any monitor commands
> > > > > that require a later stage automatically "fast forwarding" to the
> > > > > right phase?  
> > > > 
> > > > That would hide the internal details from the outside.  Sounds
> > > > nice, but adding new machine/device configuration QMP commands
> > > > while hiding the QEMU state from the outside sounds impossible.
> > > > 
> > > > For example, if we use -S today, this works:
> > > > 
> > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 
> > > > 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}  
> > > >   -> {"execute":"qmp_capabilities"}  
> > > >   <- {"return": {}}  
> > > >   -> {"execute":"query-cpus"}  
> > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 
> > > > 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": 
> > > > "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, 
> > > > "thread_id": 4038}]}
> > > > 
> > > > This means "query-cpus" needs to fast-forward to the CPU creation
> > > > stage if we want to keep compatibility.
> > > > 
> > > > Now, assume we add a set-numa-node command like the one in this
> > > > series.  e.g.:
> > > > 
> > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 
> > > > 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}  
> > > >   -> {"execute":"qmp_capabilities"}  
> > > >   <- {"return": {}}  
> > > >   -> {"execute":"set-numa-node" ... }  
> > > >   <- {"return": ...}
> > > > 
> > > > The command will work only if machine initialization didn't run
> > > > yet.
> > > > 
> > > > But now an innocent-looking query command would change QEMU state
> > > > in an unexpected way:
> > > > 
> > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 
> > > > 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}  
> > > >   -> {"execute":"qmp_capabilities"}  
> > > >   <- {"return": {}}  
> > > >   -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]  
> > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 
> > > > 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": 
> > > > "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, 
> > > > "thread_id": 4038}]}  
> > > >   -> {"execute":"set-numa-node" ... }  
> > > >   <- {"error": ...}  [the command will fail because the machine was 
> > > > already created]
> > > > 
> > > > This means we do have a externally visible "too late to use
> > > > set-numa-node" QEMU state, and query-cpus will have a externally
> > > > visible side effect.  Every QMP command would need to document
> > > > how it affects QEMU state in a externally visible way.
> > > > 
> > > > If QEMU pause state is still going to be externally visible this
> > > > way, I would prefer to let the client to explicitly tell what's
> > > > the state they want QEMU to be, instead of making QEMU change
> > > > state silently as a side effect of QMP commands.  
> > > 
> > > Yeah, good point.  My proposal would just have changed explicitly
> > > exposed ugly internal state to subtly exposed ugly internal state,
> > > which is probably worse :(.
> > > 
> > > 
> > > Ok.. next possibly bad idea..
> > > 
> > > What about a "re-exec" monitor command; it would take what's
> > > essentially a new command line, and basically restart qemu from the
> > > beginning, reparsing this new command line, but without actually 
> > > 
> > > Pro:
> > >   * Mitigates Daniel Berrange's concern about lots of qemu
> > >     configuration being buried in the qmp session - if libvirt logged
> > >     its last "re-exec" that would have what is generally needed.
> > >   * Lets libvirt do assorted investigation of options, then rewind to
> > >     choose what it actually wants  
> > 
> > Sounds like a superset of Paolo's "-machine none" proposal[1].
> > It would be a very simple interface, not sure it can be easily
> > implemented efficiently.
> > 
> > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg488618.html
> > 
> > > 
> > > Con:
> > >   * Would require a bunch of auditing of structures/state to make sure
> > >     they can be re-initialized cleanly  
> > 
> > This sounds like a big obstacle.  QEMU still have too much global
> > state outside the machine/qdev tree.
> > 
> > 
> > >   * Would it be fast enough for libvirt to use?  Do we know if the
> > >     slowness which makes multiple qemu invocations by libvirt
> > >     unattractive is from the kernel/libc/ldso overhead, or from qemu's
> > >     internal start up processing?  
> > 
> > My gut feeling is that this could be too slow, if the scope of
> > "re-exec" is too big.
> > 
> > 
> > Now, let me try to go to the opposite extreme: I think you had a
> > good point in your previous proposal.  Why should we need to
> > restart/re-execute anything at all just because some bit of
> > configuration is being changed by libvirt?  Why commands like
> > set-numa-node should require QEMU to be in a state that is not
> > covered by -S?  If the guest is not running yet, there should be
> > no reason to require clients to explicitly pause/continue/restart
> > anything.
> It's probably doable to do numa config at '-S' time for x86 (arm),
> since ACPI tables are regenerated on the first read (legacy fw_cfg
> would be a little problematic but probably could be 'fixed' as well)
> 
> But I can't say outright if it's doable for other targets,
> in general issue here is that '-S' pauses after machine_done is run
> and all necessary wiring board requires is finalized by then
> and no hooks run after unpause.
> If there is a general consensus to go this route, I can invest
> some time in making it work (then this series could be dropped)


My argument is that it must be always possible to change
configuration using -S (before issuing a 'cont' command), because
the guest is not running at all.  If current QEMU code makes that
difficult, we should address it internally in QEMU.


> 
> Even so, postponing set-numa to '-S' won't address Daniel's concern,
> i.e. configuration would take several round trips of command to complete
> potentially oven slow network. But as it was said libvirt can cache
> new CLI options for further reuse.
> Whether is slower/faster than starting qemu with '-M foo -smp ...' +
> querying layout and then restarting it again with -numa options
> would depend on network speed.

True, my argument doesn't address that concern.  But I expect QMP
configuration commands to be always done through a local socket,
so this is just about the added latency for local QMP round
trips.

-- 
Eduardo

Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option

Reply via email to