Eric Blake <ebl...@redhat.com> writes: > On 11/03/2015 11:30 AM, Markus Armbruster wrote: >> Eric Blake <ebl...@redhat.com> writes: >> >>> Previously, working with alternates required two enums, and >>> some indirection: for type Foo, we created Foo_qtypes[] which >>> maps each qtype to a member of FooKind_lookup[], then use >> >> member of FooKind, actually. > > Or entry in the FooKind_lookup[] array. > >> >>> FooKind_lookup[] like we do for other union types. >> >> You probably mean FooKind here as well. > > I'll play with the wording. > >> >>> This has a subtle bug: since the values of FooKind_lookup >>> start at zero, all entries of Foo_qtypes that were not >>> explicitly initialized map to the same branch of the union as >>> the first member of the alternate, rather than triggering a >>> failure in visit_get_next_type(). Fortunately, the bug >>> seldom bites; the very next thing the input visitor does is >>> try to parse the incoming JSON with the wrong parser, which >>> fails; the output visitor is not used with a C struct in that >>> state, and the dealloc visitor has nothing to clean up (so >>> there is no leak). >> >> Yes, I remember us discussing this bug. >> >> While reading code to double-check your description, I stumbled over >> this beauty in generated qapi-visit.c: >> >> visit_get_next_type(v, (int*) &(*obj)->type, BlockdevRef_qtypes, >> name, &err); >> >> This casts enum BlockdevRefKind * to int *, which assumes the compiler >> represents the enum BlockdevRefKind as int or unsigned. It is free to >> use any integer type, though. Common mistake of programmers with >> insufficiently developed wariness of C's subtleties. >> >> visit_get_next_type() passes the fishy int * on to v->get_next_type(). >> Only implementation is qmp_input_get_next_type(), which uses it so: >> >> *kind = qobjects[qobject_type(qobj)]; >> >> Latent death trap. >> >> Does your patch clean this up? > > Yes, and I need to also document that this is an additional bug fix. > >> >>> However, it IS observable in one case: the behavior of an >>> alternate that contains a 'number' member but no 'int' member >>> differs according to whether the 'number' was first in the >>> qapi definition, and when the input being parsed is an integer; >>> this is because the 'number' parser accepts QTYPE_QINT in >>> addition to the expected QTYPE_QFLOAT. A later patch will worry >>> about fixing alternates to parse all inputs that a non-alternate >>> 'number' would accept, for now it is still marked FIXME. > > See [1] below. > >>> >>> This patch fixes the validation bug by deleting the indirection, >>> and modifying get_next_type() to directly return a qtype code. >> >> get_next_type() doesn't return anything. Do you mean "store a qtype >> code"? > > Yes. > >> >>> There is no longer a need to generate an implicit FooKind array >> >> FooKind is an enum, not an array. > > ...to generate an implicit FooKind enum, nor FooKind_lookup[] array. > >> >>> associated with the alternate type (since the QMP wire format >>> never uses the stringized counterparts of the C union member >>> names); that also means we no longer have a collision with an >>> alternate branch named 'max'. Next, the generated visitor is >>> fixed to properly detect unexpected qtypes in the switch >>> statement. This is done via the use of a new >>> QAPISchemaAlternateTypeTag subclass and the use of a new >>> member.c_type() method when producing qapi-types. The new >>> subtype also allows us to clean up a TODO left in the previous >>> commit. >>> >>> Callers now have to know the QTYPE_* mapping when looking at the >>> discriminator; but so far, only the testsuite was even using the >>> C struct of an alternate types. If that gets too confusing, we >>> could reintroduce FooKind, but initialize it differently than >>> most generated arrays, as in: >>> typedef enum FooKind { >>> FOO_KIND_A = QTYPE_QDICT, >>> FOO_KIND_B = QTYPE_QINT, >>> } FooKind; >>> to create nicer aliases for knowing when to use foo->a or foo->b >>> when inspecting foo->type. But without a current client, I >>> didn't see the point of doing it now. > > You have a point below that we either need to reserve MAX and require no > case-insensitive clashes, or that we will never want to add it. I'm > leaning towards never going back, because the new way feels so much nicer. [...] >>> +++ b/tests/qapi-schema/qapi-schema-test.json >>> @@ -131,7 +131,7 @@ >>> 'data': { 'value1': 'UserDefZero', 'has_a': 'UserDefZero', >>> 'u': 'UserDefZero', 'type': 'UserDefZero' } } >>> { 'alternate': 'AltName', 'data': { 'type': 'int', 'u': 'bool', >>> - 'myKind': 'has_a' } } >>> + 'myKind': 'has_a', 'max': 'str' } } >> >> Here, you add the positive test that alternate name 'max' works. >> >> One, not mentioned in the commit message. > > D'oh. > >> >> Two, the commit message says we may reintroduce FooKind if working with >> qtype_code turns out to be too confusing. If we ever do that, alternate >> name 'max' breaks, doesn't it? Shouldn't we keep it reserved then, just >> in case? >> > > See my comment above; at this point, I doubt we'll ever want to go back, > so maybe I just need to be more definitive in stating that.
Fair enough. However, we should also consider QAPI language regularity and simplicity. To make that argument, I need to back up a bit. There are three kinds of name collisions: QAPI schema, QMP wire, generated C. QAPI schema collisions are the obvious ones: * JSON object member names must be distinct (enforced by our JSON parser). * Enumeration values must be distinct. * Each flat union's variant's members must be distinct from the non-variant members. A translation of the schema into a protocol or code can add additional restrictions. I believe the translation to QMP wire doesn't add any. We don't mangle names, we don't add object members or enumeration values. The translation to C does add some: * Mangled names can collide even when unmangled names don't. * Names can collide with names used by the implementation. It can also remove collision possibilities (e.g. variant and non-variant members end up in separate name spaces), but that's not important here. One possible approach would be to let qapi.py worry about QAPI schema / QMP wire collisions, and the C compiler about generated C collisions. We rejected that approach, because navigating from the C compiler's error messages to the broken spot in the QAPI schema is a bother. Instead, qapi.py knows enough about the generators to predict collisions in generated C. This is feasible only because the generators follow simple and regular rules: * Names are mangled by c_name(), prefixes or suffixes may be tacked on. * Exception: enumeration values are mangled by camel_to_upper(). Although the way c_enum_const() is defined, you have to look closely to see it. * Any pair of names that clashes before mangling also clashes after. Permits omitting collision checks for unmangled names when the mangled names are checked. * Tag values are mangled *both* ways, because they occur both as C union members and as C enum members. * We reserve a bunch of names for generator use: mangled names can't start with 'q_', mangled member names can't start with 'has_', mangled enumeration values can't be 'MAX', and so forth. Remarks: * A simple union's member names are also tag values, and are therefore mangled both ways, and both reserved member and enum value names apply. * The moment you use an enumeration as type of a flat union tag, the other mangling and reserved names kicks in. Conclusions: * Having two different name manglers is a headache we could do without, especially since the second one camel_to_upper() is pretty magic. We have it only to get typedef enum BlockDeviceIoStatus { BLOCK_DEVICE_IO_STATUS_OK = 0, BLOCK_DEVICE_IO_STATUS_FAILED = 1, BLOCK_DEVICE_IO_STATUS_NOSPACE = 2, BLOCK_DEVICE_IO_STATUS_MAX = 3, } BlockDeviceIoStatus; instead of typedef enum BlockDeviceIoStatus { BlockDeviceIoStatus_ok = 0, BlockDeviceIoStatus_failed = 1, BlockDeviceIoStatus_nospace = 2, BlockDeviceIoStatus_MAX = 3, } BlockDeviceIoStatus; Bah! CODING_STYLE doesn't even ask for shouting enumeration constants. Can't see why we do. * Keeping the complexity of the rules under control is good both for qapi.py and for the QAPI schema language. To that end, I think we should consider reserving the same set of names both for members and tag values. It gets rid of complications like enumerations you can't use as flat union tags. Additionally, the question whether to keep the door open for generating an enum for the alternate cases becomes moot. What do you think? [...]