I found a project [1] that converts various schema formats to and from
Avro. According to the README, Avrotize is a "Rosetta Stone" for data
structure definitions, allowing you to convert between numerous data and
database schema formats and generate code for different programming
languages.

In our organization, we receive and send messages in formats like EDI, XML,
JSON, and CSV. Having their payloads in an Avro schema in our distribution
and processing layer would greatly simplify our architecture.

After converting some of our schemas using the tool, I observed some
behavior regarding enums:

   - Enums that contain numeric values are prefixed with an underscore
   (e.g., "400" becomes "_400"). This adheres to the Avro specification, but I
   can't find any explanation as to why an enum symbol cannot start with a
   numeric character.
   - Enum descriptions are dropped, as Avro only holds the enum symbol in
   the schema. The author of Avrotize has extended the Avro spec to address
   this limitation [2].

I'm curious about how the community views the evolution of the Avro spec
and its role in new use cases.

Cheers,
Jeroen

[1] https://github.com/clemensv/avrotize
[2]
https://github.com/clemensv/avrotize/blob/master/specs/avrotize-schema.md

Reply via email to