Hi, Excuse my ignorance, but I'm not at all familiar with IDL. Is there an easy way to translate it to a JSON Avro schema, please? (preferably online :))
cheers, rog. On Fri, 20 Dec 2019 at 21:06, Zoltan Farkas <zolyfar...@yahoo.com> wrote: > Hi Roger, > > have you considered leveraging avro logical types, and keep the payload > and event metadata “separate”? > > Here is a example (will use avro idl, since that is more readable to me > :-) ): > > record MetaData { > @logicalType(“instant") string timeStamp; > ….. all the meta data fields... > } > > record CloudEvent { > > MetaData metaData; > > Any payload; > > } > > @logicalType(“any") > record Any { > > /** here you have the schema of the data, for efficiency, you can use a > schema id + schema repo, or something like > https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences */ > string schema; > > bytes data; > > } > > this way a system that is interested in the metadata does not even have to > deserialize the payload…. > > hope it helps. > > —Z > > > On Dec 18, 2019, at 11:49 AM, roger peppe <rogpe...@gmail.com> wrote: > > Hi, > > Background: I've been contemplating the proposed Avro format in the CloudEvent > specification > <https://github.com/cloudevents/spec/blob/master/avro-format.md>, which > defines standard metadata for events. It defines a very generic format for > an event that allows storage of almost any data. It seems to me that by > going in that direction it's losing almost all the advantages of using Avro > in the first place. It feels like it's trying to shoehorn a dynamic message > format like JSON into the Avro format, where using Avro itself could do so > much better. > > I'm hoping to propose something better. I had what I thought was a nice > idea, but it doesn't *quite* work, and I thought I'd bring up the subject > here and see if anyone had some better ideas. > > The schema resolution > <https://avro.apache.org/docs/current/spec.html#Schema+Resolution> part > of the spec allows a reader to read a schema that was written with extra > fields. So, theoretically, we could define a CloudEvent something like this: > > { "name": "CloudEvent", "type": "record", "fields": [{ "name": "Metadata", > "type": { "type": "record", "name": "CloudEvent", "namespace": " > avro.apache.org", "fields": [{ "name": "id", "type": "string" }, { "name": > "source", "type": "string" }, { "name": "time", "type": "long", " > logicalType": "timestamp-micros" }] } }] } > > Theoretically, this could enable any event that's a record that has *at > least* a Metadata field with the above fields to be read generically. The > CloudEvent type above could be seen as a structural supertype of all > possible more-specific CloudEvent-compatible records that have such a > compatible field. > > This has a few nice advantages: > - there's no need for any wrapping of payload data. > - the CloudEvent type can evolve over time like any other Avro type. > - all the data message fields are immediately available alongside the > metadata. > - there's still exactly one schema for a topic, encapsulating both the > metadata and the payload. > > However, this idea fails because of one problem - this schema resolution > rule: "both schemas are records with the same (unqualified) name". This > means that unless *everyone* names all their CloudEvent-compatible > records "CloudEvent", they can't be read like this. > > I don't think people will be willing to name all their records > "CloudEvent", so we have a problem. > > I can see a few possible workarounds: > > 1. when reading the record as a CloudEvent, read it with a schema > that's the same as CloudEvent, but with the top level record name changed > to the top level name of the schema that was used to write the record. > 2. ignore record names when matching schema record types. > 3. allow aliases to be specified when writing data as well as reading > it. When defining a CloudEvent-compatible event, you'd add a CloudEvent > alias to your record. > > None of the options are particularly nice. 1 is probably the easiest to > do, although means you'd still need some custom logic when decoding > records, meaning you couldn't use stock decoders. > > I like the idea of 2, although it gets a bit tricky when dealing with > union types. You could define the matching such that it ignores names only > when the two matched types are unambiguous (i.e. only one record in both). > This could be implemented as an option ("use structural typing") when > decoding. > > 3 is probably cleanest but interacts significantly with the spec (for > example, the canonical schema transformation strips aliases out, but they'd > need to be retained). > > Any thoughts? Is this a silly thing to be contemplating? Is there a better > way? > > cheers, > rog. > > >