For now, only high-level review. The main cost of sorting is interface complexity: we need to specify which things are sorted, and what the sorting order is (see your TODO below). Once we've done so, we can't go back.
There's also implementation complexity, but your patch shows it's low enough to be ignored. Cost needs be justified by benefits. Let's have a look at the benefits. Eric Blake <ebl...@redhat.com> writes: > Sorting the values of an enum makes it easier to look up whether > a particular value is present, via binary rather than linear search > (probably most visible with QKeyCode, which has grown over > several releases). Making the interface guarantee "sorted" saves a client that wants it sorted (say for binary search) the trouble of sorting itself. We sort at QEMU compile time instead of client run time. My qmp-introspect.c has 48 SchemaInfoEnum, ranging from one to 125 members. Histogram: #enums with this #members 6 1 11 2 8 3 9 4 2 5 1 6 3 7 1 8 2 9 1 15 1 19 1 27 1 43 1 125 Mean is less than eight. Median is *three*. Sorting these member arrays in the client won't have a noticeable performance impact. Heck, I suspect even using linear search wouldn't! Therefore, we can't really claim client performance as a benefit. We might be able to claim reduced client complexity as a benefit, but I'm rather skeptical. When your search space has a dozen members, you better stick to simple search techniques. Linear search in array or list and binary search in array look adequate, hash table already approaches overkill, and anything fancier definitely is. Of these, only binary search profits from a "sorted" guarantee. But when you got data in an array already, all it takes to sort it is a call to qsort() and a comparison function. Ten straightforward lines of code, tops. Less in a language that isn't quite as primitive as C. Not much of a benefit, I'm afraid. > Additionally, QMP clients need not know which > C value is associated with an enum name, so sorting is an > effective way to hide that non-ABI aspect of qapi. Sorting indeed hides the implementation detail of how enumerations are encoded in the server. However, I can't see what would tempt clients into relying on it. I can see for type names, and that's why we hide them (commit 1a9a507). One more potential benefit: when the schema changes in a way that affects only order in introspection, sorting can hide the changes from clients. Can't think of a practical use of it, though. > While we are at it, it is also easy to sort the members and > variants of objects, to allow for a similar binary search (although > less compelling, as any struct with that many members should > arguably be broken into hierarchical substructs), and equally > valid since JSON objects have no specific order in which keys must > appear. My qmp-introspect.c has 48 SchemaInfoObject, ranging from zero to 27 members. Histogram: #objs with this #members 3 0 102 2 58 3 28 4 14 5 17 6 15 7 3 8 2 9 7 10 2 11 3 12 1 13 2 14 1 15 1 16 1 27 Mean is 9.5, median is 9. The variants are also enums, and therefore can't be any worse than enums. Again, client performance is hardly a benefit, and client complexity isn't particularly convincing, either. > There is no trivial or obvious way to sort the types of > an alternate, so that is left unchanged. In the schema, an alternate's member has a name, a type and nothing else, so that's what we could use to sort. The name isn't visible in introspection, and the type is masked. Sorting by either hides schema change from the client, but isn't of much use to the client otherwise. If you want to let the client use binary search without having to sort itself, you obviously have to sort by masked type. My qmp-introspect.c has *two* alternate types, each with two members. > However, the overall SchemaInfo array remains unsorted. It might > make sense to sort with 'meta-type' as a primary key and 'name' > as a secondary key, but it is not obvious that this will provide > benefits to end-user clients (we allow mutually recursive types, > so there is no posible topological sorting where a single pass > over the array could resolve all types, and while binary searches > could be made possible by sorting, it would be even more efficient > for clients to read the array into a hashtable for O(1) rather > than O(log n) random-access lookups, at which point pre-sorting is > wasted effort). My qmp-introspect.c has: 2 alternate 55 array 5 builtin 126 command 48 enum 35 event 260 object ------------- 531 total Since they all share the same entity name space, I'd use a single hash table to map from name to SchemaInfo, just like qapi.py's QAPISchema does. Since we visit stuff in a defined order, qmp-introspect.c's order is stable. The order just isn't particularly useful for clients, let alone specified. > Document these conventions, so that clients will know what can > and cannot be relied on. > > Signed-off-by: Eric Blake <ebl...@redhat.com> > > --- > TODO: should the documentation mention that the sorting is done > in the C locale? I'd assume C locale because we're sorting plain ASCII strings. But spelling it out can't hurt. > Is there anything required to ensure that python > sorts sanely (ie. that the choice of locale while building > doesn't cause inadvertent sorting differences such as turning on > case-insensitivity)? Good question. > v8: no change > v7: tweak commit wording > v6: no change from v5 > --- > docs/qapi-code-gen.txt | 21 +++++++++++++++++---- > qapi/introspect.json | 22 +++++++++++++++++----- > scripts/qapi-introspect.py | 9 ++++++--- > 3 files changed, 40 insertions(+), 12 deletions(-) > > diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt > index ba29bc6..163f547 100644 > --- a/docs/qapi-code-gen.txt > +++ b/docs/qapi-code-gen.txt > @@ -516,6 +516,13 @@ query-qmp-schema. QGA currently doesn't support > introspection. > > query-qmp-schema returns a JSON array of SchemaInfo objects. These > objects together describe the wire ABI, as defined in the QAPI schema. > +There is no specified order to the SchemaInfo objects returned; a > +client must search for a particular name and meta-type throughout the > +entire array to learn more about that name. For now, the name for > +each SchemaInfo is unique thanks to qapi naming conventions; however > +this may be changed in the future (such as allowing a command and an > +event with the same name), so it is important that the client check > +for the desired meta-type. I feel separate name spaces aren't necessary and would be confusing, and I'd be prepared to cast "single entity name space" in stone now. > > However, the SchemaInfo can't reflect all the rules and restrictions > that apply to QMP. It's interface introspection (figuring out what's > @@ -596,7 +603,8 @@ any. Each element is a JSON object with members "name" > (the member's > name), "type" (the name of its type), and optionally "default". The > member is optional if "default" is present. Currently, "default" can > only have value null. Other values are reserved for future > -extensions. > +extensions. The "members" array is sorted by "name", so that clients > +can use a binary search to see if a particular member is supported. > > Example: the SchemaInfo for MyType from section Struct types > [...] This all boils down to whether the increase in interface specification complexity is justified by the benefits. The cost is small, but I'm having a hard time seeing the benefits, to be honest. Am I missing something?