Hi David, Thanks for the FLIP. +1 for it. I have a minor comment.
Can you please elaborate more on mechanisms in place to ensure data consistency and integrity, particularly in the event of schema conflicts? Since each message includes a schema ID for inbound and outbound messages, can you elaborate more on message consistency in the context of schema evolution? Regards, Jeyhun On Wed, Mar 20, 2024 at 4:34 PM David Radley <david...@apache.org> wrote: > Thank you very much for your feedback Mark. I have made the changes in the > latest google document. On reflection I agree with you that the > globalIdPlacement format configuration should apply to the deserialization > as well, so it is declarative. I am also going to have a new configuration > option to work with content IDs as well as global IDs. In line with the > deser Apicurio IdHandler and headerHandlers. > > kind regards, David. > > > On 2024/03/20 15:18:37 Mark Nuttall wrote: > > +1 to this > > > > A few small comments: > > > > Currently, if users have Avro schemas in an Apicurio Registry (an open > source Apache 2 licensed schema registry), then the natural way to work > with those Avro flows is to use the schemas in the Apicurio Repository. > > 'those Avro flows' ... this is the first reference to flows. > > > > The new format will use the global Id to look up the Avro schema that > the message was written during deserialization. > > I get the point, phrasing is awkward. Probably you're more interested in > content than word polish at this point though. > > > > The Avro Schema Registry (apicurio-avro) format > > The Confluent format is called avro-confluent; this should be > avro-apicurio > > > > How to create tables with Apicurio-avro format > > s/Apicurio-avro/avro-apicurio/g > > > > HEADER – globalId is put in the header > > LEGACY– global Id is put in the message as a long > > CONFLUENT - globalId is put in the message as an int. > > Please could we specify 'four-byte int' and 'eight-byte long' ? > > > > For a Kafka source the globalId will be looked for in this order: > > - In the header > > - After a magic byte as an int > > - After a magic byte as a long. > > but apicurio-avro.globalid-placement has a default value of HEADER : why > do we have a search order as well? Isn't apicurio-avro.globalid-placement > enough? Don't the two mechanisms conflict? > > > > In addition to the types listed there, Flink supports reading/writing > nullable types. Flink maps nullable types to Avro union(something, null), > where something is the Avro type converted from Flink type. > > Is that definitely the right way round? I know we've had multiple > conversations about how unions work with Flink > > > > This is because the writer schema is expanded, but this could not > complete if there are circularities. > > I understand your meaning but the sentence is awkward. > > > > The registered schema will be created or if it exists be updated. > > same again > > > > At some stage the lowest Flink level supported by the Kafka connector > will contain the additionalProperties methods in code flink. > > wording > > > > There existing Kafka deserialization for the writer schema passes down > the message body to be deserialised. > > wording > > > > @Override > > public void deserialize(ConsumerRecord<byte[], byte[]> message, > Collector<T> out) > > throws IOException { > > Map<String, Object> additionalPropertiesMap = new HashMap<>(); > > for (Header header : message.additionalProperties()) { > > headersMap.put(header.key(), header.value()); > > } > > deserializationSchema.deserialize(message.value(), headersMap, > out); > > } > > This fails to compile at headersMap. > > > > The input stream and additionalProperties will be sent so the Apicurio > SchemaCoder which will try getting the globalId from the headers, then 4 > bytes from the payload then 8 bytes from the payload. > > I'm still stuck on apicurio-avro.globalid-placement having a default > value of HEADER . Should we try all three, or fail if this config param has > a wrong value? > > > > Other considerations > > The implementation does not use the Apicurio deser libraries, > > Please can we refer to them as SerDes; this is the term used within the > documentation that you link to > > > > > > On 2024/03/20 10:09:08 David Radley wrote: > > > Hi, > > > As per the FLIP process I would like to raise a FLIP, but do not have > authority, so have created a google doc for the Flip to introduce a new > Apicurio Avro format. The document is > https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing > > > > > > I have prototyped a lot of the content to prove that this approach is > feasible. I look forward to the discussion, > > > Kind regards, David. > > > > > > > > > > > > Unless otherwise stated above: > > > > > > IBM United Kingdom Limited > > > Registered in England and Wales with number 741598 > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > > > > > >