Hi Yacine, Thanks for the proposal. It sounds interesting, but I want to make sure there's a clear use case for this because it's a significant change to the current evolution rules. Right now we guarantee that a reader will get an error if the data has an unknown union branch rather than getting a default value. I think that makes sense: if the reader is requesting a field, it should get the actual datum for it rather than a default because it doesn't know how to handle it.
Could you give us an example use case that requires this new logic? I just want to make sure we can't solve your problem another way. For example, pushing evolution lower in the schema usually does the trick: rather than having ["null", "RecordV1"] => ["null", "RecordV1", "RecordV2"], it is usually better to update the record so that older readers can ignore the new fields. Thanks, rb On Mon, Mar 14, 2016 at 7:30 AM, Yacine Benabderrahmane < [email protected]> wrote: > Hi all, > > In order to provide a solution to the union schema evolution problem, as it > was earlier notified in the thread "add a type to a union > < > http://search-hadoop.com/m/F2svI1IXrQS1bIFgU1/union+evolution&subj=add+a+type+to+a+union > >" > of the user mailing list, we decided, for the needs of the reactive > architecture we have implemented for one of our clients, to implement an > evolution of the compatibility principle of Avro when using Unions. For > reminder, the asked question was about the way to handle the case where a > reader, using an old version of a schema that includes a union, reads some > data written with a new version of the schema where a type has been added > to the union. > > As answered by Martin Kleppman in that thread, one way to handle this kind > of evolution (a new version of the schema adds a new type type in a union) > would be to ensure that all the development streams have integrated the new > schema B before deploying it in the IT schema referential. > However, in big structures involving strongly uncorrelated teams (in the > product life-cycle point of view), this approach appears to be quite > impracticable, causing production stream congestion, blocking behavior > between teams, and a bunch of other > unwanted-counter-agile-/-reactive-phenomena... > > Therefore, we had to implement a new *compatibility* *mode* for the unions, > while taking care to comply with the following rules: > > 1. Clear rules of compatibility are stated and integrated for this > compatibility mode > 2. The standard Avro behavior must be kept intact > 3. All the evolution implementation must be done without introducing any > regression (all existing tests of the Avro stack must succeed) > 4. The code impact on Avro stack must be minimized > > Just to give you a very brief overview (as I don't know if this is actually > the place for a full detailed description), the evolution addresses the > typical problem where two development streams use the same schema but in > different versions, in the case described shortly as follows: > > - The first development stream, called "DevA", uses the version A of a > schema which integrates a union referencing two types, say "null" and > "string". The default value is set to null. > - The second development team, called "DevB", uses the version B, which > is an evolution of the version A, as it adds a reference to a new type > in > the former union, say "long" (which makes it "null", string" and "long") > - When the schema B is deployed on the schema referential (in our case, > the IO Confluent Schema Registry) subsequently to the version A > - The stream "DevA" must be able to read with schema A, even if the > data has been written using the schema B with the type "long" in > the union. > In the latter case, the read value is the union default value > - The stream "DevB" must be able to read/write with schema B, even if > it writes the data using the type "long" in the union > > The evolution that we implemented for this mode includes some rules that > are based on the principles stated in the Avro documentation. It is even > more powerful than showed in the few lines above, as it enables the readers > to get the default value of the union if the schema used for reading does > not contain the type used by the writer in the union. This achieves a new > mode of forward / backward compatibility. This evolution is for now working > perfectly, and should be on production in the few coming weeks. We have > also made an evolution of the IO Confluent Schema Registry stack to support > it, again in a transparent manner (we also intend to contribute to this > stack in a second / parallel step). > > In the objective of contributing to the Avro stack with this new > compatibility mode for unions, I have some questions about the procedure: > > 1. How can I achieve the contribution proposal? Should I directly > provide a patch in JIRA and dive into the details right there? > 2. The base version of this evolution is 1.7.7, is it eligible to > contribution evaluation anyway? > > Thanks in advance, looking forward to hearing from you and giving you more > details. > > Kind Regards, > -- > *Yacine Benabderrahmane* > Architect > *OCTO Technology* > <http://www.octo.com> > ----------------------------------------------- > Tel : +33 6 10 88 25 98 > 50 avenue des Champs Elysées > 75008 PARIS > www.octo.com > -- Ryan Blue Software Engineer Netflix
