Somewhat relevant, here is a CUE schema for Avro schemas <https://github.com/heetch/cue-schema/blob/bb31c583dd09b87a4f1a206290bd57be39b6382f/avro/schema/schema.cue> that I wrote a little while ago that can be used to check Avro schema compliance to a degree (if you haven't heard of CUE, there's a bunch of info on it at cuelang.org).
My understanding of Avro was somewhat less then, so it's probably wrong in parts, and it's definitely not a strict as it could be, but I've found it useful, and it has lots of room for improvement. cheers, rog. On Fri, 6 Dec 2019 at 17:43, Jonah H. Harris <jonah.har...@gmail.com> wrote: > On Fri, Dec 6, 2019 at 12:16 PM Ryan Skraba <r...@skraba.com> wrote: > >> Hello! Yes, it looks like `fixed` is the only named complex type that >> doesn't have a doc attribute. No primitive types have the doc >> attribute. >> >> This might be an omission, but I don't think it's inconsistent. In my >> experience, there's no compelling reason to document schemas of >> primitive types, but a good practice for the fields or container types >> that they're inside. Fixed is not a primitive type, but in practice >> it's used like bytes (which is). >> > > Hey, Ryan. Thanks for getting back to me so quickly. > > Yeah. I don't think primitive types need the doc attribute. As fixed is > complex and can be an independent type, however, I thought that was > inconsistent with the other complex types. > > >> In my opinion, I wouldn't consider it important to make the doc >> attribute universal on any type/field, but I wouldn't have any strong >> objection if that were the consensus. Today, I'm pretty sure that the >> Java implementation corresponds to the spec with regards to the doc >> attribute. >> > > Agreed. > > >> As a minimum, I'd propose that the only action here is to change the >> IDL guide: "Comments that begin with /** are used as the documentation >> string (if applicable) for the type or field definition that follows >> the comment." >> >> Is this what you're looking for? >> > > Yes. We're actually using the doc string to store not only a textual > description of the field/type, but also a set of annotations used for event > storage and data masking. The main reason we wanted doc to be consistent > for all complex types (including fixed) is that it permits us to easily > tell what complex objects can exist across the ecosystem directly from our > schema repository. Initially, we wanted to use a separate internal > attribute (similar to the lenses obfuscate attribute approach -- > https://docs.lenses.io/2.0/install_setup/datagovernance/index.html#data-anonymization > -- but > we've found several Avro tools strip out all non-spec-compliant attributes. > This leaves us only the doc field. > > P.S. I'm very intrigued by the "thorough schema compliance checker"! >> Is this something that would be shared? Would it help find other >> inconsistencies in the Avro spec and implementations? >> > > Yes, this will be open-sourced. > > -- > Jonah H. Harris > >