Re: Making QEMU easier for management tools and applications

Christophe de Dinechin Sat, 25 Jan 2020 14:35:25 -0800

> On 23 Jan 2020, at 18:58, John Snow <js...@redhat.com> wrote:
> 
> 
> 
> On 1/23/20 2:19 AM, Markus Armbruster wrote:
>> John Snow <js...@redhat.com> writes:
>> 
>>> On 12/24/19 8:41 AM, Daniel P. Berrangé wrote:
>>>>> * scripts/qmp/qmp-shell
>>>>> 
>>>>>  Half-hearted attempt at a human-friendly wrapper around the JSON
>>>>>  syntax.  I have no use for this myself.
>>>> I use this fairly often as its a useful debugging / experimentation
>>>> / trouble shooting tool. There's similar ish functionality in
>>>> virsh qemu-monitor-command. I think there's scope of a supported
>>>> tool here that can talk to libvirt or a UNIX socket for doing
>>>> QMP commands, with a friendlier syntax & pretty printing. 
>>>> 
>>> 
>>> qmp-shell is one of my go-to tools for working through bitmap workflows
>>> where we don't have convenience commands yet, as some of the setups
>>> required for fleecing et al involve quite a number of steps.
>>> 
>>> I can copy-paste raw JSON into a socket, but personally I like seeing my
>>> commands neatly organized in a format where I can visually reduce them
>>> to their components at a glance.
>>> 
>>> (What I mean is: It's hard to remember which QMP commands you've barfed
>>> into a terminal because JSON is hard to read and looks very visually
>>> repetitive.)
>>> 
>>> I tried to rewrite qmp-shell late last year, actually. I wanted to write
>>> a new REPL that was json-aware in some manner such that you could write
>>> multi-line commands like this:
>>> 
>>>> example-command arg={
>>>  "hello": "world"
>>> }
>>> 
>>> This requires, sadly, a streamable JSON parser. Most JSON parsers built
>>> into Python as-is simply take a file pointer and consume the entirety of
>>> the rest of the stream -- they don't play very nice with incomplete
>>> input or input that may have trailing data, e.g.:
>>> 
>>>> example-command arg={
>>>  "hello": "world"
>>> } arg2={
>>>  "oops!": "more json!"
>>> }
>> 
>> QMP is in the same boat: it needs to process input that isn't
>> necessarily full expressions (JSON-text in the RFC's grammar).
>> 
>> Any conventional parser can be made streaming by turning it into a
>> coroutine.  This is probably the simplest solution for handwritten
>> streaming LL parsers, because it permits recursive descent.  In Python,
>> I'd try a generator.
>> 
>> Our actual solution for QMP predates coroutine support in QEMU, and is
>> rather hamfisted:
>> 
>> * Streaming lexer: it gets fed characters one at a time, and when its
>>  state machine says "token complete", it feeds the token to the
>>  "streamer".
>> 
>> * "Streamer": gets fed tokens one at a time, buffers them up counting
>>  curly and square bracket nesting until the nesting is zero, then
>>  passes the buffered tokens to the parser.
>> 
>> * Non-streaming parser: it gets fed a sequence of tokens that constitute
>>  a full expression.
>> 
>> The best I can say about this is that it works.  The streamer's token
>> buffer eats a lot of memory compared to a real streaming parser, but in
>> practice, it's a drop in the bucket.
>> 
> 
> I looked into this at one point. I forget why I didn't like it. I had
> some notion that I should replace this one too, but forget exactly why.
> Maybe it wasn't that bad, if I've forgotten.
> 
>>> Also, due to the nature of JSON as being a single discrete object and
>>> never a stream of objects, no existing JSON parser really supports the
>>> idea of ever seeing more than one object per buffer.
>> 
>> That plainly sucks.
>> 
>>> ...So I investigated writing a proper grammar for qmp-shell.
>> 
>> Any parser must start with a proper grammar.  If it doesn't, it's a toy,
>> or a highway to madness.
>> 
>>> Unfortunately, this basically means including the JSON grammar as a
>>> subset of the shell grammar and writing your own parser for it entirely.
>> 
>> Because qmp-shell is a half-hearted wrapper: we ran out of wrapping
>> paper, so JSON sticks out left and right.
>> 
>> Scrap and start over.
>> 
>>> I looked into using Python's own lexer; but it's designed to lex
>>> *python*, not *json*. I got a prototype lexer working for this purpose
>>> under a grammar that I think reflects JSON, but I got that sinking
>>> feeling that it was all more trouble than it was worth, and scrapped
>>> working on it any further.
>> 
>> Parsing JSON is pretty simple.  Data point: QAPISchemaParser parses our
>> weird derivative of JSON in 239 SLOC.
>> 
>>> I did not find any other flex/yacc-like tools that seemed properly
>>> idiomatic or otherwise heavily specialized. I gave up on the idea of
>>> writing a new parser.
>> 
>> While I recommend use of tools for parsing non-trivial grammars (you'll
>> screw up, they won't), they're massive overkill for JSON.
>> 
>>> I'd love to offer a nice robust QMP shell that is available for use by
>>> end users, but the syntax of the shell will need some major considerations.
>> 
>> Scrap and start over.
>> 
>> [...]
>> 
> 
> Yes, I agree: Scrap and start over.
> 
> What SHOULD the syntax look like, though? Clearly the idea of qmp-shell
> is that it offers a convenient way to enter the top-level keys of the
> arguments dict. This works absolutely fine right up until you need to
> start providing nested definitions.

Well, if you are really ready to start from scratch, I might offer the XL syntax
as a starting point for a discussion of a user-visible syntax that is also
applicable for text-based or binary API exchanges.

I’m going to talk about it at FOSDEM in the “minimalist languages” design.
Those who are in Brussels might want to attend to get a better feel.
Source code is here: https://github.com/c3d/xl, but the only part you
care about for this discussion is src/{parser,scanner}.{c,h} and the
syntax configuration file src/xl.syntax. As well as renderer styles
src/xl.stylesheet, src/html.stylesheet, etc.

Key points for the use case considered:
- Tiny (~2000 lines of code for parser/scanner, a C and a C++ implementation)
- Fully introspectable, serializable in a cross-platform way, printable (with 
styles)
- Character-precise position tracking for error printing
- Parser preserves comments (for documentation generators)
- Small, if slow, interpreter in about 20K lines of code (~bash speed on some 
tests)
  meaning we would get a “qemu scripting language” with loops, tests, 
arithmetic, etc.

More detailed discussion at end of this mail if you think it warrants a second 
look.
In any case, if it helps, I’d be happy to help connecting it to qemu…

> 
> For the nesting, we say: "Go ahead and use JSON, but you have to take
> all the spaces out."

Here, that would be A.B.C, which parses as
(infix.
 (infix.
  A
  B)
 C
)

(result of `xl -nobuiltins -parse test.xl -style debug -show`)

Also, an example given earlier:

 { 'command': 'iothread-set-poll-params',
   'data': {
       'id': 'str',
        '*max-ns': 'uint64',
        '*grow': 'uint64',
        '*shrink': 'uint64'
   },
   'map-to-qom-set': 'IOThread'
 }

could be written as:

command iothread_set_poll_params
        data
                id : str
                *max_ns : uint64
                *grow : uint64
                *shrink : uint64
        map_to_qom_set IOThread

But if you want to keep the original syntax, it seems to parse and render 
practically OK:

% cat /tmp/a.xl 
{ 'command': 'iothread-set-poll-params',
   'data': {
       'id': 'str',
        '*max-ns': 'uint64',
        '*grow': 'uint64',
        '*shrink': 'uint64'
   },
   'map-to-qom-set': 'IOThread'
 }

%xl -nobuiltins -parse /tmp/a.xl  -show            
{ 'command':'iothread-set-poll-params', 'data': { 'id':'str', 
    '*max-ns':'uint64', '*grow':'uint64', '*shrink':'uint64'
}, 'map-to-qom-set':'IOThread' }

This is with no change to the XL parser / scanner code
whatsoever, not even to the syntax file. So that gives me hope
that we could have a “reasonably good” compatibility mode
that transforms the quasi-JSON format into the new form,
with a single parser accepting both.

> 
> This... works, charitably, but is hardly what I would call usable.
> 
> For the CLI, we offer a dot syntax notation that resembles nothing in
> particular. It often seems the case that it isn't expressive enough to
> map losslessly to JSON. I suspect it doesn't handle siblings very well.
> 
> A proper HMP-esque TUI would likely have need of coming up with its own
> pet syntax for commands that avoid complicated nested JSON definitions,
> but for effort:value ratio, having a QMP shorthand shell that works
> arbitrarily with any command might be a better win.

The XL proposal here would be to have a single format shared by
- The source definitions used to generate C code
- The monitor / internal shell syntax
- The command-line syntax
- The API data (possibly in serialized form for compactness)

> 
> Do we still have a general-case problem of how to represent QAPI
> structures in plaintext? Will this need to be solved for the CLI, too?
> 
> --js

More info below.

Here are some aspects that I think are interesting about it:

- Tiny (2000 lines of code for scanner and parser, ~20K for a full interpreter)
C:  wc parser.c parser.h scanner.c scanner.h 
     716    2183   26702 parser.c
     100     440    3372 parser.h
     926    2966   30537 scanner.c
     206     945    8249 scanner.h
    1948    6534   68860 total

C++:
     726    2372   26918 parser.cpp
     885    2480   26363 scanner.cpp
     248    1025    8867 ../include/scanner.h
     166     687    5958 ../include/parser.h
    2025    6564   68106 total

- Simple (parse tree with 8 node types, integer, real, name/symbol, text, 
infix, prefix, postfix and block)

        + integer, e.g. 12, 1_000_000 or 16#33A or 2#10101
        + real, e.g. 11.3, 16#1.FFF#e-3, 2#1.01
        + text, e.g. “ABC”, ‘ABC’, <<Long text, multi-lines>> (configurable 
separators)
        + name/symbols, a.g. Foo_Bar, +, <=, (precedence and spelling 
configurable)

        + infix, e.g. A+B, A and B
        + prefix, e.g. +3, sin X
        + postfix, e.g. 3%, 3 km
        + block, e.g. [A], (A), {A} and indentation blocks

- Fully introspectable (mostly because the parse tree is simple)
- Reversible, i.e. can be printed, including with formatting, e.g.:
        % xl -nobuiltins -parse demo/1-hello.xl -show
        tell "localhost”, 
            print "Hello World”
        % xl -nobuiltins -parse demo/1-hello.xl -style debug -show   
        (prefix
         tell
         (infix,
          “localhost"
          (block indent
           (prefix
            print
            "Hello World”
           ))))
- Also has a binary serializer that produces a platform-independent format
- Has multiple implementations, notably C and C++ implementation (and even one 
in XL :-)
- Validated on thousands of lines of input, with various language styles (e.g. 
Ada-like or functional)
- Character-level position tracking for error messages in scripts / config 
files:
        /tmp/xl.xl:1007:8: Mismatched identation, expected “)"
        /tmp/xl.xl:2409:23: Mismatched identation, expected ""
- Designed to be easy to read and write
- Powerful enough to parse itself 
(https://github.com/c3d/xl/blob/master/xl2/native/xl.parser.xl)
- Dynamically configurable syntax (spelling and precedence of operators)
- Multi-line text with configurable separators, e.g. the following can be made 
a text constant by having XML and END_XML as text separators:
     XML
         <stuff>
            Insert your XML here
        </stuff> 
    END_XML
- Based-numbers in any base, e.g. 8#777, 16#FFFF_FFFF and 2#1.001 as valid 
numbers
- Has essentially a single contributor (me), so easy to relicense as needed
- There is an interpreter, e.g. potential evaluate expressions like 2+3*A
- Relatively fast (6.1s to parse 1M lines of code representing 40M of code, cpp 
~1s)
% wc /tmp/tmp.xl
 1000000 3893922 41679700 /tmp/tmp.xl
% time xl -parse /tmp/tmp.xl
 6.10s user 0.21s system 99% cpu 6.346 total
- Support multiple styles, e.g. using { } for blocks or indentation, 
parentheses or not, etc.

Cons (but I’m not the better person to come up with cons on this pet project of 
mine ;-):
- Idiosyncratic
- Single contributor
- Not well maintained
- Definitely not production quality (even the makefiles are broken ;-)
- Has some CI testing, but it fails, and it’s totally insufficient
- Interpreter far from perfect
- Designed with another purpose in mind (a programming language)
- Syntax is not C-centric, e.g. 16#FFFF instead of 0xFFFF.
- Name syntax does not allow -, i.e. max-ns is “max minus ns”, max_ns OK.
- [insert probably about a thousand others here]

Precedences and other stuff can be configured dynamically, through a file
in the current implementations, eg. 
https://github.com/c3d/xl-c/blob/master/xl.syntax.
So that means we can have a “nice” syntax for the commands and objects,
and a format that can serve both as a config file format, as a command
language, and as a full shell-style language with if, loops, etc.

It also supports nested syntaxes, i.e. dynamic changes of precedence
between selected separators. Used to support simplified C syntax
with “extern int foo();”, where the “C” syntax is active between “extern”
and “;”. Could be useful for compatibility.

Parse tree is simple enough that it’s fully introspectable.

There is a (configurable) renderer, so you can generate source from the
internal data structure. The renderer can generate colorized source
code in HTML, so I guess we could generate C data structures relatively
easily.

I believe that it is relatively trivial to configure the parser syntax file 
to accept the QEMU quasi-JSON. (some code changes required to teach
it to totally ignore whitespace, toi avoid error messages).

More complete documentation about the language is here:
https://c3d.github.io/xl, but it’s quite light on implementation details.
So read only if you have a bit of time.
Re: Making QEMU easier for management tools and applications

Reply via email to