On Wed, 4 Dec 2019 at 11:38, Lee Hambley <lee.hamb...@gmail.com> wrote:
> HI Rog, > > Good question, the answer lay in the docs in the "Parsing Canonical Form > for Schemas" where it states (amongst all the other transformation rules) > > [ORDER] Order the appearance of fields of JSON objects as follows: *name*, >> type, * fields*, symbols, items, values, size. For example, if an object >> has type, name, and size fields, then the name field should appear >> first, followed by the type and then the size fields. > > > (emphasis mine) > > The canonical form for schemas becomes more relevant to Avro usage when > working with a schema registry for e.g, but it's a really common use-case > and I consider definition of a canonical form for schema comparisons to be > a strength of Avro compared with other serialization formats. > > - > https://avro.apache.org/docs/1.8.2/spec.html#Parsing+Canonical+Form+for+Schemas > Thanks very much - I'd missed that, very helpful! Maybe you might be able to help with another part of the spec that I've been puzzling over too: default values for complex types. The spec doesn't seem to say how unions in complex types are specified when in default values. For example, consider the following schema: { "type": "record", "name": "R", "fields": [ { "name": "F", "type": { "type": "array", "items": [ { "type": "enum", "name": "E1", "symbols": ["A", "B"] }, { "type": "enum", "name": "E2", "symbols": ["B", "A", "C"] } ] }, "default": ["A", "B", "C"] } ] } This seems like it should be valid according to the spec, because default value encodings don't encode the type name in enums, unlike in the JSON encoding, but in this case there seems to way to tell which enum types end up in the array value of the field F, because the enum symbols themselves are ambiguous. How are schema validators meant to resolve this ambiguity? cheers, rog. > HTH, > > Lee Hambley > http://lee.hambley.name/ > +49 (0) 170 298 5667 > > > On Wed, 4 Dec 2019 at 12:17, roger peppe <rogpe...@gmail.com> wrote: > >> Hi, >> >> My apologies in advance if this topic has been well discussed before - >> the mailing list search tool appears to be broken (the link points to the >> expired domain name "search-hadoop.com"). >> >> I'm trying to understand about recursive types in Avro, given that the >> specification says about names >> <http://avro.apache.org/docs/current/spec.html#names>: >> >> a name must be defined before it is used ("before" in the depth-first, >>> left-to-right traversal of the JSON parse tree, where the types attribute >>> of a protocol is always deemed to come "before" the messages attribute.) >> >> >> By my reading, this would make the following Avro schema invalid, because >> the name "R" will not yet be defined when it's referenced inside the type >> of the field F, because in depth-first order, the leaf is traversed before >> the root. >> >> { >> "type": "record", >> "fields": [ >> {"name": "F", "type": ["null", "R"]} >> ], >> "name": "R" >> } >> >> It seems that types like this are valid in practice (I found the above >> example in an Avro test suite), so could someone enlighten me as to how >> this is allowed, please? >> >> Thanks for any info. If I'm asking in the wrong place, please advise me >> of a better forum! >> >> rog. >> >> >>