Re: Historical QAPI schema parser, "compiled schema", and qapi-schema-diff

John Snow Tue, 18 Jun 2024 17:57:19 -0700

On Thu, Jun 13, 2024 at 12:12 PM Daniel P. Berrangé <berra...@redhat.com>
wrote:


> On Thu, Jun 13, 2024 at 02:13:15AM -0400, John Snow wrote:
> > Hi, recently I've been working on overhauling our QMP documentation; see
> > https://jsnow.gitlab.io/qemu/qapi/index.html for a recent
> work-in-progress
> > page showcasing this.
> >
> > As part of this project, Markus and I decided it'd be nice to be able to
> > auto-generate "Since" information. The short reason for 'why' is because
> > since info hard-coded into doc comments may not be accurate with regards
> to
> > the wire protocol availability for a given field when a QAPI definition
> is
> > shared or inherited by multiple sources. If we can generate it, it should
> > always be accurate.
> >
> > So, I've prototyped three things:
> >
> > (1) An out-of-tree fork of the QAPI generator that is capable of parsing
> > qemu-commands.hx, qmp-commands.hx, and all versions of our
> qapi-schema.json
> > files going all the way back to v0.12.0.
> >
> > It accomplishes this with some fairly brutish hacks that I never expect
> to
> > need to check in to qemu.git.
> >
> > (2) A schema "compiler", a QAPI generator module that takes a parsed
> Schema
> > and produces a single-file JSON Schema document that describes every
> > command and event as it exists on the wire without using type names or
> any
> > information not considered to be "API".
> >
> > This part *would* need to be checked in to qemu.git (if we go in this
> > direction.)
> > The compiled historical schema would also get checked in, for the QAPI
> > parser to reference against to generate the since information.
>
> The upside with checking in every historical schema is that we
> have a set of self-contained schemas where you can see everything
> at a glance for each version.
>

Yep. It's "dumb" but very easy to access and work with.


>
> The downside with checking in every historical schema is that between
> any adjacent pair of schemas 99% of the content is identical. IOW we
> are very wasteful of storage.
>

... Also agree. Because these files avoid shared types as an explicit
design goal, and JSON Schema is *very* verbose, these files get extremely
large while saying little.

I chose them for the proof of concept because they're an existing
standard/format I didn't have to engineer or reason about heavily, and
really no other reason.


>
> Looking at your other mail about schema diffs, I wonder if we the
> diff format you show there can kill two birds with one stone.
>
>   https://lists.nongnu.org/archive/html/qemu-devel/2024-06/msg02398.html
>
> In my reply I had illustrated a variant of your format:
>
>  - x-query-rdma
>  -     returns.human-readable-text: str
>  . blockdev-backup
>  +     arguments.discard-source: Optional<boolean>
>  . migrate
>  -    arguments.blk: Optional<boolean>
>  -    arguments.inc: Optional<boolean>
>  . object-add
>  .    arguments.qom-type: enum
>  +        'sev-snp-guest'
>  +    arguments[sev-guest].legacy-vm-type: Optional<boolean>
>  +    arguments[sev-snp-guest].author-key-enabled: Optional<boolean>
>  +    arguments[sev-snp-guest].cbitpos: Optional<integer>
>
>
> Where '.' is just pre-existing context, and +/- have the obvious
> meaning for the 2 given versions.
>
> What if, we append a version number to *every* line, and exclusively
> use +/-.
>
> Taking just one small command:
>
>  + 6.2.0: x-query-rdma
>  + 6.2.0:    returns.human-readable-text: str
>  - 9.1.0: x-query-rdma
>
> This tell us 'x-query-rdma' was added in 6.2.0, the
> 'human-readable-text' parameter arrived at the same
> time, and the whole command was then deleted in 9.1.0
> That has implicit property deletion, but for completeness
> we could be explicit about each property when deleting
> a command:
>
>  + 6.2.0: x-query-rdma
>  + 6.2.0:    returns.human-readable-text: str
>  - 9.1.0:    returns.human-readable-text: str
>  - 9.1.0: x-query-rdma
>
> Taking the more complex 'object-add' command
>
>  +  2.0.0: object-add
>  +  2.0.0:   arguments.qom-type: enum
>  +  2.0.0:     '....'
>  + 2.11.0:     'sev-guest'
>  +  9.1.0:     'sev-snp-guest'
>  + 2.11.0:   arguments[sev-guest].policy: uint32
>  + 2.11.0:   arguments[sev-guest].session-file: str
>  + 2.11.0:   arguments[sev-guest].dh-cert: str
>  +  9.1.0:   arguments[sev-guest].legacy-vm-type: Optional<boolean>
>  +  9.1.0:   arguments[sev-snp-guest].author-key-enabled: Optional<boolean>
>  +  9.1.0:   arguments[sev-snp-guest].cbitpos: Optional<integer>
>
>
> IOW, object-add was introduced in 2.0.0. The 'sev-guest' enum
> variant was added in 2.11.0 with various fields at the same
> time. The 'sev-guest' enum variant got an exctra field in 9.1.0
> The 'sev-snp-guest' enum variant was added in 9.1.0 with some
> fields.
>
>
> For fields which change from Optional <-> Required, that could
> be modelled simply as parameter deletion + addition in the
> same version eg hypothetically lets say the 'sev-guest' field
> 'policy' had changed, we would see:
>
>  +  2.0.0: object-add
>  +  2.0.0:   arguments.qom-type: enum
>  +  2.0.0:     '....'
>  + 2.11.0:     'sev-guest'
>  +  9.1.0:     'sev-snp-guest'
>  + 2.11.0:   arguments[sev-guest].policy: uint32
>  -  6.2.0:   arguments[sev-guest].policy: uint32
>  +  6.2.0:   arguments[sev-guest].policy: Optional<uint32>
>  + 2.11.0:   arguments[sev-guest].session-file: str
>  + 2.11.0:   arguments[sev-guest].dh-cert: str
>  +  9.1.0:   arguments[sev-guest].legacy-vm-type: Optional<boolean>
>  +  9.1.0:   arguments[sev-snp-guest].author-key-enabled: Optional<boolean>
>  +  9.1.0:   arguments[sev-snp-guest].cbitpos: Optional<integer>
>
>
Very interesting idea, I think this could be a reasonable compromise. I'll
have to spend some time prototyping it (and digesting your other mail,
too), but tentatively, I like the idea. Thanks a lot for really digging
into both of these mails to give your feedback on this subproject.

(IOW: I think I like it, but haven't sat with it enough to really know if
there's anything it doesn't do that I need it to do. Prototyping it will
tell me. One concern I might have is that I'll need some custom code to
compare a QAPISchema object against the stored history file in order to
amend that history file. I'm not sure how complex that will be at present,
but admit my current solution is very egregious with regards to SLOC in the
git repo. And it's not as if the JSON Schema writing/reading code I
prototyped is particularly short, either.)


>
> Incidentally, if going down this route, I think I would NOT
> have 1 file with the whole schema history, but have 1 file
> per command / event. eg qapi/history/object-add.txt,
> qapi/history/x-query-rdma.txt, qapi/history/VFIO_MIGRATION.txt,
> etc. This will make it trivial for a person to focus in on
> changes in the command they care about, likely without even
> needing a schema diff tool much of the time, as the per-command
> files will often be concise enough you can consider the full
> history without filtering.
>

Interesting idea... might be a lot of files, but I suppose those don't
really *cost* anything, now do they? :)

I guess you lose out on a good summary, but a tool can just parse
qapi/history/*.txt or whatnot and concatenate the results to stdout for
you; I suppose it'd be little more than `cat qapi/history/*.txt | grep
"9\.0\.0"` or similar.


>
> > (3) A script that can diff two compiled schema, showing a change report
> > between two versions. (I sent an email earlier today/yesterday showing
> > example output of this script.) This one was more for "fun", but it
> helped
> > prove all the other parts were working correctly, and it might be useful
> in
> > the future when auditing changes during the RC phase. We may well decide
> to
> > commit this script upstream, or one like it.
>
> With a single file containing all deltas, where each line is
> version annotated, the "diff" tool becomes little more than
> something which can 'grep' for lines in the file which have
> a version number within the desired range. In fact it can also
> optionally offer something better than a diff, as instead of
> showing you only the orignal state and result state, it
> can trivially shows you any intermediate changes and what
> version they happened with.
>
> eg if you asked for a diff between 2.0.0 and 9.1.0, and there
> was a command or property that was added in 4.0.0 and deleted
> in 6.0.0, a traditional diff will not tell you about this. You'll
> never notice it ever existed.
>
> A "history grep" showing the set of changes between 2 versions
> will highlight things that come + go, which can be quite
> useful for understanding API evolution I think.
>

Good point. The existing diff tool I wrote was just a prototype to prove
"this sort of thing was possible", but I didn't put much thought into its
design beyond "It was quick to write as a proof of concept".

Maximizing this information's utility for use with existing utilities
without needing to maintain lots of our own script code is a great design
goal to keep in mind.


>
>
>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>

Thanks again! I'm going to re-focus on some of the more immediate changes
for the documentation project for now, but I'll no doubt be returning to
the historical parsing / since information subproject before too long --
just didn't want to sit on your email for too long so as to appear
ungrateful ;)

I'll loop you into future discussions on this subproject when I pick it
back up (Hopefully, not too far in the future.) -- and I'll make sure to
keep it on-list. Markus and I haven't gone too in-depth on this part yet,
so I figure I'll pick the prototyping back up when he's chewed through more
of my other patches and all of the Maintainers that need to care about this
are paying active attention. (Sorry for all the patches, Markus... You
asked for it!)

--js

Re: Historical QAPI schema parser, "compiled schema", and qapi-schema-diff

Reply via email to