> On 23 Jan 2020, at 18:58, John Snow <js...@redhat.com> wrote:
>
>
>
> On 1/23/20 2:19 AM, Markus Armbruster wrote:
>> John Snow <js...@redhat.com> writes:
>>
>>> On 12/24/19 8:41 AM, Daniel P. Berrangé wrote:
>>>>> * scripts/qmp/qmp-shell
>>>>>
>>>>> Half-hearted attempt at a human-friendly wrapper around the JSON
>>>>> syntax. I have no use for this myself.
>>>> I use this fairly often as its a useful debugging / experimentation
>>>> / trouble shooting tool. There's similar ish functionality in
>>>> virsh qemu-monitor-command. I think there's scope of a supported
>>>> tool here that can talk to libvirt or a UNIX socket for doing
>>>> QMP commands, with a friendlier syntax & pretty printing.
>>>>
>>>
>>> qmp-shell is one of my go-to tools for working through bitmap workflows
>>> where we don't have convenience commands yet, as some of the setups
>>> required for fleecing et al involve quite a number of steps.
>>>
>>> I can copy-paste raw JSON into a socket, but personally I like seeing my
>>> commands neatly organized in a format where I can visually reduce them
>>> to their components at a glance.
>>>
>>> (What I mean is: It's hard to remember which QMP commands you've barfed
>>> into a terminal because JSON is hard to read and looks very visually
>>> repetitive.)
>>>
>>> I tried to rewrite qmp-shell late last year, actually. I wanted to write
>>> a new REPL that was json-aware in some manner such that you could write
>>> multi-line commands like this:
>>>
>>>> example-command arg={
>>> "hello": "world"
>>> }
>>>
>>> This requires, sadly, a streamable JSON parser. Most JSON parsers built
>>> into Python as-is simply take a file pointer and consume the entirety of
>>> the rest of the stream -- they don't play very nice with incomplete
>>> input or input that may have trailing data, e.g.:
>>>
>>>> example-command arg={
>>> "hello": "world"
>>> } arg2={
>>> "oops!": "more json!"
>>> }
>>
>> QMP is in the same boat: it needs to process input that isn't
>> necessarily full expressions (JSON-text in the RFC's grammar).
>>
>> Any conventional parser can be made streaming by turning it into a
>> coroutine. This is probably the simplest solution for handwritten
>> streaming LL parsers, because it permits recursive descent. In Python,
>> I'd try a generator.
>>
>> Our actual solution for QMP predates coroutine support in QEMU, and is
>> rather hamfisted:
>>
>> * Streaming lexer: it gets fed characters one at a time, and when its
>> state machine says "token complete", it feeds the token to the
>> "streamer".
>>
>> * "Streamer": gets fed tokens one at a time, buffers them up counting
>> curly and square bracket nesting until the nesting is zero, then
>> passes the buffered tokens to the parser.
>>
>> * Non-streaming parser: it gets fed a sequence of tokens that constitute
>> a full expression.
>>
>> The best I can say about this is that it works. The streamer's token
>> buffer eats a lot of memory compared to a real streaming parser, but in
>> practice, it's a drop in the bucket.
>>
>
> I looked into this at one point. I forget why I didn't like it. I had
> some notion that I should replace this one too, but forget exactly why.
> Maybe it wasn't that bad, if I've forgotten.
>
>>> Also, due to the nature of JSON as being a single discrete object and
>>> never a stream of objects, no existing JSON parser really supports the
>>> idea of ever seeing more than one object per buffer.
>>
>> That plainly sucks.
>>
>>> ...So I investigated writing a proper grammar for qmp-shell.
>>
>> Any parser must start with a proper grammar. If it doesn't, it's a toy,
>> or a highway to madness.
>>
>>> Unfortunately, this basically means including the JSON grammar as a
>>> subset of the shell grammar and writing your own parser for it entirely.
>>
>> Because qmp-shell is a half-hearted wrapper: we ran out of wrapping
>> paper, so JSON sticks out left and right.
>>
>> Scrap and start over.
>>
>>> I looked into using Python's own lexer; but it's designed to lex
>>> *python*, not *json*. I got a prototype lexer working for this purpose
>>> under a grammar that I think reflects JSON, but I got that sinking
>>> feeling that it was all more trouble than it was worth, and scrapped
>>> working on it any further.
>>
>> Parsing JSON is pretty simple. Data point: QAPISchemaParser parses our
>> weird derivative of JSON in 239 SLOC.
>>
>>> I did not find any other flex/yacc-like tools that seemed properly
>>> idiomatic or otherwise heavily specialized. I gave up on the idea of
>>> writing a new parser.
>>
>> While I recommend use of tools for parsing non-trivial grammars (you'll
>> screw up, they won't), they're massive overkill for JSON.
>>
>>> I'd love to offer a nice robust QMP shell that is available for use by
>>> end users, but the syntax of the shell will need some major considerations.
>>
>> Scrap and start over.
>>
>> [...]
>>
>
> Yes, I agree: Scrap and start over.
>
> What SHOULD the syntax look like, though? Clearly the idea of qmp-shell
> is that it offers a convenient way to enter the top-level keys of the
> arguments dict. This works absolutely fine right up until you need to
> start providing nested definitions.
Well, if you are really ready to start from scratch, I might offer the XL syntax
as a starting point for a discussion of a user-visible syntax that is also
applicable for text-based or binary API exchanges.
I’m going to talk about it at FOSDEM in the “minimalist languages” design.
Those who are in Brussels might want to attend to get a better feel.
Source code is here: https://github.com/c3d/xl, but the only part you
care about for this discussion is src/{parser,scanner}.{c,h} and the
syntax configuration file src/xl.syntax. As well as renderer styles
src/xl.stylesheet, src/html.stylesheet, etc.
Key points for the use case considered:
- Tiny (~2000 lines of code for parser/scanner, a C and a C++ implementation)
- Fully introspectable, serializable in a cross-platform way, printable (with
styles)
- Character-precise position tracking for error printing
- Parser preserves comments (for documentation generators)
- Small, if slow, interpreter in about 20K lines of code (~bash speed on some
tests)
meaning we would get a “qemu scripting language” with loops, tests,
arithmetic, etc.
More detailed discussion at end of this mail if you think it warrants a second
look.
In any case, if it helps, I’d be happy to help connecting it to qemu…
>
> For the nesting, we say: "Go ahead and use JSON, but you have to take
> all the spaces out."
Here, that would be A.B.C, which parses as
(infix.
(infix.
A
B)
C
)
(result of `xl -nobuiltins -parse test.xl -style debug -show`)
Also, an example given earlier:
{ 'command': 'iothread-set-poll-params',
'data': {
'id': 'str',
'*max-ns': 'uint64',
'*grow': 'uint64',
'*shrink': 'uint64'
},
'map-to-qom-set': 'IOThread'
}
could be written as:
command iothread_set_poll_params
data
id : str
*max_ns : uint64
*grow : uint64
*shrink : uint64
map_to_qom_set IOThread
But if you want to keep the original syntax, it seems to parse and render
practically OK:
% cat /tmp/a.xl
{ 'command': 'iothread-set-poll-params',
'data': {
'id': 'str',
'*max-ns': 'uint64',
'*grow': 'uint64',
'*shrink': 'uint64'
},
'map-to-qom-set': 'IOThread'
}
%xl -nobuiltins -parse /tmp/a.xl -show
{ 'command':'iothread-set-poll-params', 'data': { 'id':'str',
'*max-ns':'uint64', '*grow':'uint64', '*shrink':'uint64'
}, 'map-to-qom-set':'IOThread' }
This is with no change to the XL parser / scanner code
whatsoever, not even to the syntax file. So that gives me hope
that we could have a “reasonably good” compatibility mode
that transforms the quasi-JSON format into the new form,
with a single parser accepting both.
>
> This... works, charitably, but is hardly what I would call usable.
>
> For the CLI, we offer a dot syntax notation that resembles nothing in
> particular. It often seems the case that it isn't expressive enough to
> map losslessly to JSON. I suspect it doesn't handle siblings very well.
>
> A proper HMP-esque TUI would likely have need of coming up with its own
> pet syntax for commands that avoid complicated nested JSON definitions,
> but for effort:value ratio, having a QMP shorthand shell that works
> arbitrarily with any command might be a better win.
The XL proposal here would be to have a single format shared by
- The source definitions used to generate C code
- The monitor / internal shell syntax
- The command-line syntax
- The API data (possibly in serialized form for compactness)
>
> Do we still have a general-case problem of how to represent QAPI
> structures in plaintext? Will this need to be solved for the CLI, too?
>
> --js
More info below.
Here are some aspects that I think are interesting about it:
- Tiny (2000 lines of code for scanner and parser, ~20K for a full interpreter)
C: wc parser.c parser.h scanner.c scanner.h
716 2183 26702 parser.c
100 440 3372 parser.h
926 2966 30537 scanner.c
206 945 8249 scanner.h
1948 6534 68860 total
C++:
726 2372 26918 parser.cpp
885 2480 26363 scanner.cpp
248 1025 8867 ../include/scanner.h
166 687 5958 ../include/parser.h
2025 6564 68106 total
- Simple (parse tree with 8 node types, integer, real, name/symbol, text,
infix, prefix, postfix and block)
+ integer, e.g. 12, 1_000_000 or 16#33A or 2#10101
+ real, e.g. 11.3, 16#1.FFF#e-3, 2#1.01
+ text, e.g. “ABC”, ‘ABC’, <<Long text, multi-lines>> (configurable
separators)
+ name/symbols, a.g. Foo_Bar, +, <=, (precedence and spelling
configurable)
+ infix, e.g. A+B, A and B
+ prefix, e.g. +3, sin X
+ postfix, e.g. 3%, 3 km
+ block, e.g. [A], (A), {A} and indentation blocks
- Fully introspectable (mostly because the parse tree is simple)
- Reversible, i.e. can be printed, including with formatting, e.g.:
% xl -nobuiltins -parse demo/1-hello.xl -show
tell "localhost”,
print "Hello World”
% xl -nobuiltins -parse demo/1-hello.xl -style debug -show
(prefix
tell
(infix,
“localhost"
(block indent
(prefix
print
"Hello World”
))))
- Also has a binary serializer that produces a platform-independent format
- Has multiple implementations, notably C and C++ implementation (and even one
in XL :-)
- Validated on thousands of lines of input, with various language styles (e.g.
Ada-like or functional)
- Character-level position tracking for error messages in scripts / config
files:
/tmp/xl.xl:1007:8: Mismatched identation, expected “)"
/tmp/xl.xl:2409:23: Mismatched identation, expected ""
- Designed to be easy to read and write
- Powerful enough to parse itself
(https://github.com/c3d/xl/blob/master/xl2/native/xl.parser.xl)
- Dynamically configurable syntax (spelling and precedence of operators)
- Multi-line text with configurable separators, e.g. the following can be made
a text constant by having XML and END_XML as text separators:
XML
<stuff>
Insert your XML here
</stuff>
END_XML
- Based-numbers in any base, e.g. 8#777, 16#FFFF_FFFF and 2#1.001 as valid
numbers
- Has essentially a single contributor (me), so easy to relicense as needed
- There is an interpreter, e.g. potential evaluate expressions like 2+3*A
- Relatively fast (6.1s to parse 1M lines of code representing 40M of code, cpp
~1s)
% wc /tmp/tmp.xl
1000000 3893922 41679700 /tmp/tmp.xl
% time xl -parse /tmp/tmp.xl
6.10s user 0.21s system 99% cpu 6.346 total
- Support multiple styles, e.g. using { } for blocks or indentation,
parentheses or not, etc.
Cons (but I’m not the better person to come up with cons on this pet project of
mine ;-):
- Idiosyncratic
- Single contributor
- Not well maintained
- Definitely not production quality (even the makefiles are broken ;-)
- Has some CI testing, but it fails, and it’s totally insufficient
- Interpreter far from perfect
- Designed with another purpose in mind (a programming language)
- Syntax is not C-centric, e.g. 16#FFFF instead of 0xFFFF.
- Name syntax does not allow -, i.e. max-ns is “max minus ns”, max_ns OK.
- [insert probably about a thousand others here]
Precedences and other stuff can be configured dynamically, through a file
in the current implementations, eg.
https://github.com/c3d/xl-c/blob/master/xl.syntax.
So that means we can have a “nice” syntax for the commands and objects,
and a format that can serve both as a config file format, as a command
language, and as a full shell-style language with if, loops, etc.
It also supports nested syntaxes, i.e. dynamic changes of precedence
between selected separators. Used to support simplified C syntax
with “extern int foo();”, where the “C” syntax is active between “extern”
and “;”. Could be useful for compatibility.
Parse tree is simple enough that it’s fully introspectable.
There is a (configurable) renderer, so you can generate source from the
internal data structure. The renderer can generate colorized source
code in HTML, so I guess we could generate C data structures relatively
easily.
I believe that it is relatively trivial to configure the parser syntax file
to accept the QEMU quasi-JSON. (some code changes required to teach
it to totally ignore whitespace, toi avoid error messages).
More complete documentation about the language is here:
https://c3d.github.io/xl, but it’s quite light on implementation details.
So read only if you have a bit of time.