Re: [Qemu-devel] [PATCH v8 04/17] qapi-introspect: Guarantee particular sorting

Markus Armbruster Fri, 30 Oct 2015 04:22:29 -0700

For now, only high-level review.

The main cost of sorting is interface complexity: we need to specify
which things are sorted, and what the sorting order is (see your TODO
below).  Once we've done so, we can't go back.

There's also implementation complexity, but your patch shows it's low
enough to be ignored.

Cost needs be justified by benefits.  Let's have a look at the benefits.

Eric Blake <ebl...@redhat.com> writes:

> Sorting the values of an enum makes it easier to look up whether
> a particular value is present, via binary rather than linear search
> (probably most visible with QKeyCode, which has grown over
> several releases).

Making the interface guarantee "sorted" saves a client that wants it
sorted (say for binary search) the trouble of sorting itself.  We sort
at QEMU compile time instead of client run time.

My qmp-introspect.c has 48 SchemaInfoEnum, ranging from one to 125
members.  Histogram:

 #enums with this #members
      6 1
     11 2
      8 3
      9 4
      2 5
      1 6
      3 7
      1 8
      2 9
      1 15
      1 19
      1 27
      1 43
      1 125

Mean is less than eight.  Median is *three*.

Sorting these member arrays in the client won't have a noticeable
performance impact.  Heck, I suspect even using linear search wouldn't!
Therefore, we can't really claim client performance as a benefit.

We might be able to claim reduced client complexity as a benefit, but
I'm rather skeptical.  When your search space has a dozen members, you
better stick to simple search techniques.  Linear search in array or
list and binary search in array look adequate, hash table already
approaches overkill, and anything fancier definitely is.  Of these, only
binary search profits from a "sorted" guarantee.  But when you got data
in an array already, all it takes to sort it is a call to qsort() and a
comparison function.  Ten straightforward lines of code, tops.  Less in
a language that isn't quite as primitive as C.  Not much of a benefit,
I'm afraid.

>                     Additionally, QMP clients need not know which
> C value is associated with an enum name, so sorting is an
> effective way to hide that non-ABI aspect of qapi.

Sorting indeed hides the implementation detail of how enumerations are
encoded in the server.  However, I can't see what would tempt clients
into relying on it.  I can see for type names, and that's why we hide
them (commit 1a9a507).

One more potential benefit: when the schema changes in a way that
affects only order in introspection, sorting can hide the changes from
clients.  Can't think of a practical use of it, though.

> While we are at it, it is also easy to sort the members and
> variants of objects, to allow for a similar binary search (although
> less compelling, as any struct with that many members should
> arguably be broken into hierarchical substructs), and equally
> valid since JSON objects have no specific order in which keys must
> appear.

My qmp-introspect.c has 48 SchemaInfoObject, ranging from zero to 27
members.  Histogram:

  #objs with this #members
      3 0
    102 2
     58 3
     28 4
     14 5
     17 6
     15 7
      3 8
      2 9
      7 10
      2 11
      3 12
      1 13
      2 14
      1 15
      1 16
      1 27

Mean is 9.5, median is 9.

The variants are also enums, and therefore can't be any worse than
enums.

Again, client performance is hardly a benefit, and client complexity
isn't particularly convincing, either.

>          There is no trivial or obvious way to sort the types of
> an alternate, so that is left unchanged.

In the schema, an alternate's member has a name, a type and nothing
else, so that's what we could use to sort.  The name isn't visible in
introspection, and the type is masked.  Sorting by either hides schema
change from the client, but isn't of much use to the client otherwise.

If you want to let the client use binary search without having to sort
itself, you obviously have to sort by masked type.

My qmp-introspect.c has *two* alternate types, each with two members.

> However, the overall SchemaInfo array remains unsorted.  It might
> make sense to sort with 'meta-type' as a primary key and 'name'
> as a secondary key, but it is not obvious that this will provide
> benefits to end-user clients (we allow mutually recursive types,
> so there is no posible topological sorting where a single pass
> over the array could resolve all types, and while binary searches
> could be made possible by sorting, it would be even more efficient
> for clients to read the array into a hashtable for O(1) rather
> than O(log n) random-access lookups, at which point pre-sorting is
> wasted effort).

My qmp-introspect.c has:

      2 alternate
     55 array
      5 builtin
    126 command
     48 enum
     35 event
    260 object
    -------------
    531 total

Since they all share the same entity name space, I'd use a single hash
table to map from name to SchemaInfo, just like qapi.py's QAPISchema
does.

Since we visit stuff in a defined order, qmp-introspect.c's order is
stable.  The order just isn't particularly useful for clients, let alone
specified.

> Document these conventions, so that clients will know what can
> and cannot be relied on.
>
> Signed-off-by: Eric Blake <ebl...@redhat.com>
>
> ---
> TODO: should the documentation mention that the sorting is done
> in the C locale?

I'd assume C locale because we're sorting plain ASCII strings.  But
spelling it out can't hurt.

>                  Is there anything required to ensure that python
> sorts sanely (ie. that the choice of locale while building
> doesn't cause inadvertent sorting differences such as turning on
> case-insensitivity)?

Good question.

> v8: no change
> v7: tweak commit wording
> v6: no change from v5
> ---
>  docs/qapi-code-gen.txt     | 21 +++++++++++++++++----
>  qapi/introspect.json       | 22 +++++++++++++++++-----
>  scripts/qapi-introspect.py |  9 ++++++---
>  3 files changed, 40 insertions(+), 12 deletions(-)
>
> diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
> index ba29bc6..163f547 100644
> --- a/docs/qapi-code-gen.txt
> +++ b/docs/qapi-code-gen.txt
> @@ -516,6 +516,13 @@ query-qmp-schema.  QGA currently doesn't support 
> introspection.
>
>  query-qmp-schema returns a JSON array of SchemaInfo objects.  These
>  objects together describe the wire ABI, as defined in the QAPI schema.
> +There is no specified order to the SchemaInfo objects returned; a
> +client must search for a particular name and meta-type throughout the
> +entire array to learn more about that name.  For now, the name for
> +each SchemaInfo is unique thanks to qapi naming conventions; however
> +this may be changed in the future (such as allowing a command and an
> +event with the same name), so it is important that the client check
> +for the desired meta-type.

I feel separate name spaces aren't necessary and would be confusing, and
I'd be prepared to cast "single entity name space" in stone now.

>
>  However, the SchemaInfo can't reflect all the rules and restrictions
>  that apply to QMP.  It's interface introspection (figuring out what's
> @@ -596,7 +603,8 @@ any.  Each element is a JSON object with members "name" 
> (the member's
>  name), "type" (the name of its type), and optionally "default".  The
>  member is optional if "default" is present.  Currently, "default" can
>  only have value null.  Other values are reserved for future
> -extensions.
> +extensions.  The "members" array is sorted by "name", so that clients
> +can use a binary search to see if a particular member is supported.
>
>  Example: the SchemaInfo for MyType from section Struct types
>
[...]

This all boils down to whether the increase in interface specification
complexity is justified by the benefits.  The cost is small, but I'm
having a hard time seeing the benefits, to be honest.

Am I missing something?

Re: [Qemu-devel] [PATCH v8 04/17] qapi-introspect: Guarantee particular sorting

Reply via email to