Somewhat relevant, here is a CUE schema for Avro schemas
<https://github.com/heetch/cue-schema/blob/bb31c583dd09b87a4f1a206290bd57be39b6382f/avro/schema/schema.cue>
that I wrote a little while ago that can be used to check Avro schema
compliance to a degree (if you haven't heard of CUE, there's a bunch of
info on it at cuelang.org).

My understanding of Avro was somewhat less then, so it's probably wrong in
parts, and it's definitely not a strict as it could be, but I've found it
useful, and it has lots of room for improvement.

  cheers,
    rog.



On Fri, 6 Dec 2019 at 17:43, Jonah H. Harris <jonah.har...@gmail.com> wrote:

> On Fri, Dec 6, 2019 at 12:16 PM Ryan Skraba <r...@skraba.com> wrote:
>
>> Hello!  Yes, it looks like `fixed` is the only named complex type that
>> doesn't have a doc attribute.  No primitive types have the doc
>> attribute.
>>
>> This might be an omission, but I don't think it's inconsistent.  In my
>> experience, there's no compelling reason to document schemas of
>> primitive types, but a good practice for the fields or container types
>> that they're inside.  Fixed is not a primitive type, but in practice
>> it's used like bytes (which is).
>>
>
> Hey, Ryan. Thanks for getting back to me so quickly.
>
> Yeah. I don't think primitive types need the doc attribute. As fixed is
> complex and can be an independent type, however, I thought that was
> inconsistent with the other complex types.
>
>
>> In my opinion, I wouldn't consider it important to make the doc
>> attribute universal on any type/field, but I wouldn't have any strong
>> objection if that were the consensus.  Today, I'm pretty sure that the
>> Java implementation corresponds to the spec with regards to the doc
>> attribute.
>>
>
> Agreed.
>
>
>> As a minimum, I'd propose that the only action here is to change the
>> IDL guide: "Comments that begin with /** are used as the documentation
>> string (if applicable) for the type or field definition that follows
>> the comment."
>>
>> Is this what you're looking for?
>>
>
> Yes. We're actually using the doc string to store not only a textual
> description of the field/type, but also a set of annotations used for event
> storage and data masking. The main reason we wanted doc to be consistent
> for all complex types (including fixed) is that it permits us to easily
> tell what complex objects can exist across the ecosystem directly from our
> schema repository. Initially, we wanted to use a separate internal
> attribute (similar to the lenses obfuscate attribute approach --
> https://docs.lenses.io/2.0/install_setup/datagovernance/index.html#data-anonymization
>  -- but
> we've found several Avro tools strip out all non-spec-compliant attributes.
> This leaves us only the doc field.
>
> P.S. I'm very intrigued by the "thorough schema compliance checker"!
>> Is this something that would be shared? Would it help find other
>> inconsistencies in the Avro spec and implementations?
>>
>
> Yes, this will be open-sourced.
>
> --
> Jonah H. Harris
>
>

Reply via email to