defaults for complex types (was Re: recursive types)

roger peppe Thu, 05 Dec 2019 05:08:15 -0800

On Wed, 4 Dec 2019 at 11:38, Lee Hambley <lee.hamb...@gmail.com> wrote:


> HI Rog,
>
> Good question, the answer lay in the docs in the "Parsing Canonical Form
> for Schemas" where it states (amongst all the other transformation rules)
>
> [ORDER] Order the appearance of fields of JSON objects as follows: *name*,
>> type, * fields*, symbols, items, values, size. For example, if an object
>> has type, name, and size fields, then the name field should appear
>> first, followed by the type and then the size fields.
>
>
> (emphasis mine)
>
> The canonical form for schemas becomes more relevant to Avro usage when
> working with a schema registry for e.g, but it's a really common use-case
> and I consider definition of a canonical form for schema comparisons to be
> a strength of Avro compared with other serialization formats.
>
> -
> https://avro.apache.org/docs/1.8.2/spec.html#Parsing+Canonical+Form+for+Schemas
>

Thanks very much - I'd missed that, very helpful!

Maybe you might be able to help with another part of the spec that I've
been puzzling over too: default values for complex types.
The spec doesn't seem to say how unions in complex types are specified when
in default values.

For example, consider the following schema:

{
    "type": "record",
    "name": "R",
    "fields": [
        {
            "name": "F",
            "type": {
                "type": "array",
                "items": [
                    {
                        "type": "enum",
                        "name": "E1",
                        "symbols": ["A", "B"]
                    },
                    {
                        "type": "enum",
                        "name": "E2",
                        "symbols": ["B", "A", "C"]
                    }
                ]
            },
            "default": ["A", "B", "C"]
        }
    ]
}

This seems like it should be valid according to the spec, because default
value encodings don't encode the type name in enums, unlike in the JSON
encoding, but in this case there seems to way to tell which enum types end
up in the array value of the field F, because the enum symbols themselves
are ambiguous.

How are schema validators meant to resolve this ambiguity?

 cheers,
    rog.


> HTH,
>
> Lee Hambley
> http://lee.hambley.name/
> +49 (0) 170 298 5667
>
>
> On Wed, 4 Dec 2019 at 12:17, roger peppe <rogpe...@gmail.com> wrote:
>
>> Hi,
>>
>> My apologies in advance if this topic has been well discussed before -
>> the mailing list search tool appears to be broken (the link points to the
>> expired domain name "search-hadoop.com").
>>
>> I'm trying to understand about recursive types in Avro, given that the
>> specification says about names
>> <http://avro.apache.org/docs/current/spec.html#names>:
>>
>> a name must be defined before it is used ("before" in the depth-first,
>>> left-to-right traversal of the JSON parse tree, where the types attribute
>>> of a protocol is always deemed to come "before" the messages attribute.)
>>
>>
>> By my reading, this would make the following Avro schema invalid, because
>> the name "R" will not yet be defined when it's referenced inside the type
>> of the field F, because in depth-first order, the leaf is traversed before
>> the root.
>>
>> {
>>     "type": "record",
>>     "fields": [
>>         {"name": "F", "type": ["null", "R"]}
>>     ],
>>     "name": "R"
>> }
>>
>> It seems that types like this are valid in practice (I found the above
>> example in an Avro test suite), so could someone enlighten me as to how
>> this is allowed, please?
>>
>> Thanks for any info. If I'm asking in the wrong place, please advise me
>> of a better forum!
>>
>>     rog.
>>
>>
>>

defaults for complex types (was Re: recursive types)

Reply via email to