Re: Avro union compatibility mode enhancement proposal

Ryan Blue Sat, 19 Mar 2016 05:03:14 -0700

Hi Yacine,

Thanks for the proposal. It sounds interesting, but I want to make sure
there's a clear use case for this because it's a significant change to the
current evolution rules. Right now we guarantee that a reader will get an
error if the data has an unknown union branch rather than getting a default
value. I think that makes sense: if the reader is requesting a field, it
should get the actual datum for it rather than a default because it doesn't
know how to handle it.


Could you give us an example use case that requires this new logic?

I just want to make sure we can't solve your problem another way. For
example, pushing evolution lower in the schema usually does the trick:
rather than having ["null", "RecordV1"] => ["null", "RecordV1",
"RecordV2"], it is usually better to update the record so that older
readers can ignore the new fields.

Thanks,

rb

On Mon, Mar 14, 2016 at 7:30 AM, Yacine Benabderrahmane <
[email protected]> wrote:

> Hi all,
>
> In order to provide a solution to the union schema evolution problem, as it
> was earlier notified in the thread "add a type to a union
> <
> http://search-hadoop.com/m/F2svI1IXrQS1bIFgU1/union+evolution&subj=add+a+type+to+a+union
> >"
> of the user mailing list, we decided, for the needs of the reactive
> architecture we have implemented for one of our clients, to implement an
> evolution of the compatibility principle of Avro when using Unions. For
> reminder, the asked question was about the way to handle the case where a
> reader, using an old version of a schema that includes a union, reads some
> data written with a new version of the schema where a type has been added
> to the union.
>
> As answered by Martin Kleppman in that thread, one way to handle this kind
> of evolution (a new version of the schema adds a new type type in a union)
> would be to ensure that all the development streams have integrated the new
> schema B before deploying it in the IT schema referential.
> However, in big structures involving strongly uncorrelated teams (in the
> product life-cycle point of view), this approach appears to be quite
> impracticable, causing production stream congestion, blocking behavior
> between teams, and a bunch of other
> unwanted-counter-agile-/-reactive-phenomena...
>
> Therefore, we had to implement a new *compatibility* *mode* for the unions,
> while taking care to comply with the following rules:
>
>    1. Clear rules of compatibility are stated and integrated for this
>    compatibility mode
>    2. The standard Avro behavior must be kept intact
>    3. All the evolution implementation must be done without introducing any
>    regression (all existing tests of the Avro stack must succeed)
>    4. The code impact on Avro stack must be minimized
>
> Just to give you a very brief overview (as I don't know if this is actually
> the place for a full detailed description), the evolution addresses the
> typical problem where two development streams use the same schema but in
> different versions, in the case described shortly as follows:
>
>    - The first development stream, called "DevA", uses the version A of a
>    schema which integrates a union referencing two types, say "null" and
>    "string". The default value is set to null.
>    - The second development team, called "DevB", uses the version B, which
>    is an evolution of the version A, as it adds a reference to a new type
> in
>    the former union, say "long" (which makes it "null", string" and "long")
>    - When the schema B is deployed on the schema referential (in our case,
>    the IO Confluent Schema Registry) subsequently to the version A
>       - The stream "DevA" must be able to read with schema A, even if the
>       data has been written using the schema B with the type "long" in
> the union.
>       In the latter case, the read value is the union default value
>       - The stream "DevB" must be able to read/write with schema B, even if
>       it writes the data using the type "long" in the union
>
> The evolution that we implemented for this mode includes some rules that
> are based on the principles stated in the Avro documentation. It is even
> more powerful than showed in the few lines above, as it enables the readers
> to get the default value of the union if the schema used for reading does
> not contain the type used by the writer in the union. This achieves a new
> mode of forward / backward compatibility. This evolution is for now working
> perfectly, and should be on production in the few coming weeks. We have
> also made an evolution of the IO Confluent Schema Registry stack to support
> it, again in a transparent manner (we also intend to contribute to this
> stack in a second / parallel step).
>
> In the objective of contributing to the Avro stack with this new
> compatibility mode for unions, I have some questions about the procedure:
>
>    1. How can I achieve the contribution proposal? Should I directly
>    provide a patch in JIRA and dive into the details right there?
>    2. The base version of this evolution is 1.7.7, is it eligible to
>    contribution evaluation anyway?
>
> Thanks in advance, looking forward to hearing from you and giving you more
> details.
>
> Kind Regards,
> --
> *Yacine Benabderrahmane*
> Architect
> *OCTO Technology*
> <http://www.octo.com>
> -----------------------------------------------
> Tel : +33 6 10 88 25 98
> 50 avenue des Champs Elysées
> 75008 PARIS
> www.octo.com
>



-- 
Ryan Blue
Software Engineer
Netflix

Re: Avro union compatibility mode enhancement proposal

Reply via email to