I worry that an implicit resolution logic will cause confusion. According to the spec, records should only match if their names (or one of the aliases) match. From what I understand, some implementations are more flexible when names are missing (which is fine since user-generated records should always have a name), but allowing further special cases doesn’t seem like a clean way forward. The resolution rules are already relatively complex.
It also seems hard to come up with a “structure fit metric” which will work for all use-cases and remain simple to understand. In practical uses of this, I can imagine that multiple fields will match so predicting computed defaults from numbers of matches would be error-prone. In some cases users might end up having to add unused fields to branches to get the correct default selected. An explicit default branch selector would be a much more straightforward user experience. Since the current union declaration syntax doesn’t allow attributes, I wonder whether it would be a good time to introduce an alternative syntax similar to what Doug mentioned (which might also address requests for “named unions”)? The existing “anonymous” unions could still be used when no such resolution is required. Delegating default selection to the branches themselves is a great idea but it will be tricky to handle reference branches smoothly. More minor but it also doesn’t feel intuitive to not have the union “own” its default attribute. What do you think? > On Apr 20, 2016, at 9:09 PM, Ryan Blue <[email protected]> wrote: > > Interesting idea, Doug. I was thinking about just updating the logic like > the fix in AVRO-1590 <https://issues.apache.org/jira/browse/AVRO-1590>, but > it would be nice to have a way to do this in the spec. > > What do you think about dealing with this by updating the union branch > matching rules (and possibly adding them to the spec) rather than adding a > property to one of the union's schemas? Making the default a property of an > inner schema makes me think that we will have to deal with multiple schemas > with such a label at some point. Maybe it would be better to try to match > automatically by some metric like largest subset of fields projected. > > rb > > On Mon, Apr 18, 2016 at 2:13 PM, Doug Cutting <[email protected]> wrote: > >> I was responding to the "For unions, we will add an optional catch-all >> attribute" part. The syntax of union schemas is unfortunately hard to >> modify compatibly. >> >> Here's a way around that. Recall that every kind of schema, except >> union, supports arbitrary properties, and unions cannot be directly >> nested. Given this, we could have a property on one of the branch >> schemas that declares it as the branch to be used when the writer's >> branch is not found in the reader. For example, if one wanted >> unmatched to be read as null, then the reader could use a union like, >> [{"type":"null","isDefaultBranch":true}, ... ], or one could add that >> property to a record schema that acts as a base class. Note however >> that, when using a record as the default branch, one could not then >> use that same record as a non-default branch in another union. To >> ameliorate that, we might permit multiple default branches in a union >> to be specified as default with the convention that the first such is >> used. >> >> Doug >> >> >> On Mon, Apr 18, 2016 at 1:18 PM, Ryan Blue <[email protected]> >> wrote: >>> Sorry, I thought you were talking about the more recent topic, the enum >>> changes. I see what you're saying now about unions. >>> >>> My initial proposal was not to change how unions are represented in the >>> schema, but to update how resolution happens. Right now, we will match a >>> read schema to a branch by structure for cases where the record has been >>> renamed. This would apply similar logic, allowing a more generic record >> to >>> match. >>> >>> I wasn't thinking there would be a schema change, though we could >> certainly >>> make that happen to standardize this behavior. >>> >>> rb >>> >>> On Mon, Apr 18, 2016 at 1:11 PM, Ryan Blue <[email protected]> wrote: >>> >>>> Doug, I don't think I understand. Why would this change a union >>>> representation? >>>> >>>> This wouldn't change the schema format, other than to add an attribute >> to >>>> enum types that is ignored by older readers. New readers will use that >>>> attribute to determine which symbol to use when the written symbol is >>>> unknown. >>>> >>>> rb >>>> >>>> On Mon, Apr 18, 2016 at 12:59 PM, Doug Cutting <[email protected]> >> wrote: >>>> >>>>> Perhaps then its sufficient to only write the new schema format when >>>>> the new attribute is specified, so existing apps will continue to >>>>> represent unions as JSON arrays? If so, this should probably be >>>>> written into the spec. >>>>> >>>>> On Mon, Apr 18, 2016 at 12:52 PM, Ryan Blue <[email protected] >>> >>>>> wrote: >>>>>> Isn't the problem that these changes aren't compatible right now >>>>> anyway? If >>>>>> I need to add an entry to an enum right now, older readers fail when >>>>> trying >>>>>> to handle that data. This creates a way to avoid that failure in new >>>>>> versions. >>>>>> >>>>>> On Mon, Apr 18, 2016 at 12:48 PM, Doug Cutting <[email protected]> >>>>> wrote: >>>>>> >>>>>>> On Sun, Apr 17, 2016 at 2:00 PM, Matthieu Monsch < >> [email protected]> >>>>>>> wrote: >>>>>>>> + For unions, we will add an optional catch-all attribute to mark >> a >>>>>>> branch as resolution target when no names or aliases match (and >> come up >>>>>>> with the corresponding syntax). >>>>>>> >>>>>>> Can this be compatible? If you add a new union syntax (e.g., >>>>>>> {"type":"union", "branches":[...], "default":...}) then existing >>>>>>> implementations will not be able to read new data that uses this >>>>>>> feature. >>>>>>> >>>>>>> Doug >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Software Engineer >>>>>> Netflix >>>>> >>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >> > > > > -- > Ryan Blue > Software Engineer > Netflix
