Re: defaults for complex types (was Re: recursive types)

Lee Hambley Thu, 05 Dec 2019 10:02:50 -0800

Hi Rog,

Glad my pointers were useful, the Avro spec really is a marvel.


Regarding your follow-up question, I'm honestly not sure, interesting
contrived example however, and interesting that no matter how well written
the spec is, it can still be ambiguous.

I found this snipped in the 1.9x docs, where I know there was some changes
to defaults for complex types, the 1.8 docs may be incomplete in that
regard. ( https://avro.apache.org/docs/1.9.0/spec.html#schema_complex )

Default values for union fields correspond to the first schema in the
> union. Default values for bytes and fixed fields are JSON strings, where
> Unicode code points 0-255 are mapped to unsigned 8-bit byte values 0-255.
>

I take `Default values for union fields correspond to the first schema in
the union` to mean that your default including values from the 2nd schema
in the union is invalid, *or* that where the member exists in the first
union it refers to the first union, and when not, it refers to the first
schema in which it _does_ exist.

One way to find out would be to run some data through a couple of common
implementations, and see how they handle the resulting data, and, maybe
feed that back into Avro docs in the form of a PR if you come up with
something useful?

Either way, I'm curious now! Let me know when you have an answer?

Cheers,

Lee Hambley
http://lee.hambley.name/
+49 (0) 170 298 5667


On Thu, 5 Dec 2019 at 14:07, roger peppe <rogpe...@gmail.com> wrote:

> On Wed, 4 Dec 2019 at 11:38, Lee Hambley <lee.hamb...@gmail.com> wrote:
>
>> HI Rog,
>>
>> Good question, the answer lay in the docs in the "Parsing Canonical Form
>> for Schemas" where it states (amongst all the other transformation rules)
>>
>> [ORDER] Order the appearance of fields of JSON objects as follows: *name*,
>>> type, * fields*, symbols, items, values, size. For example, if an
>>> object has type, name, and size fields, then the name field should
>>> appear first, followed by the type and then the size fields.
>>
>>
>> (emphasis mine)
>>
>> The canonical form for schemas becomes more relevant to Avro usage when
>> working with a schema registry for e.g, but it's a really common use-case
>> and I consider definition of a canonical form for schema comparisons to be
>> a strength of Avro compared with other serialization formats.
>>
>> -
>> https://avro.apache.org/docs/1.8.2/spec.html#Parsing+Canonical+Form+for+Schemas
>>
>
> Thanks very much - I'd missed that, very helpful!
>
> Maybe you might be able to help with another part of the spec that I've
> been puzzling over too: default values for complex types.
> The spec doesn't seem to say how unions in complex types are specified
> when in default values.
>
> For example, consider the following schema:
>
> {
>     "type": "record",
>     "name": "R",
>     "fields": [
>         {
>             "name": "F",
>             "type": {
>                 "type": "array",
>                 "items": [
>                     {
>                         "type": "enum",
>                         "name": "E1",
>                         "symbols": ["A", "B"]
>                     },
>                     {
>                         "type": "enum",
>                         "name": "E2",
>                         "symbols": ["B", "A", "C"]
>                     }
>                 ]
>             },
>             "default": ["A", "B", "C"]
>         }
>     ]
> }
>
> This seems like it should be valid according to the spec, because default
> value encodings don't encode the type name in enums, unlike in the JSON
> encoding, but in this case there seems to way to tell which enum types end
> up in the array value of the field F, because the enum symbols themselves
> are ambiguous.
>
> How are schema validators meant to resolve this ambiguity?
>
>  cheers,
>     rog.
>
>
>> HTH,
>>
>> Lee Hambley
>> http://lee.hambley.name/
>> +49 (0) 170 298 5667
>>
>>
>> On Wed, 4 Dec 2019 at 12:17, roger peppe <rogpe...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> My apologies in advance if this topic has been well discussed before -
>>> the mailing list search tool appears to be broken (the link points to the
>>> expired domain name "search-hadoop.com").
>>>
>>> I'm trying to understand about recursive types in Avro, given that the
>>> specification says about names
>>> <http://avro.apache.org/docs/current/spec.html#names>:
>>>
>>> a name must be defined before it is used ("before" in the depth-first,
>>>> left-to-right traversal of the JSON parse tree, where the types attribute
>>>> of a protocol is always deemed to come "before" the messages
>>>>  attribute.)
>>>
>>>
>>> By my reading, this would make the following Avro schema invalid,
>>> because the name "R" will not yet be defined when it's referenced inside
>>> the type of the field F, because in depth-first order, the leaf is
>>> traversed before the root.
>>>
>>> {
>>>     "type": "record",
>>>     "fields": [
>>>         {"name": "F", "type": ["null", "R"]}
>>>     ],
>>>     "name": "R"
>>> }
>>>
>>> It seems that types like this are valid in practice (I found the above
>>> example in an Avro test suite), so could someone enlighten me as to how
>>> this is allowed, please?
>>>
>>> Thanks for any info. If I'm asking in the wrong place, please advise me
>>> of a better forum!
>>>
>>>     rog.
>>>
>>>
>>>

Re: defaults for complex types (was Re: recursive types)

Reply via email to