Re: Schema Evolution & Json Schemas

Andrew Otto Sun, 25 Feb 2024 07:38:33 -0800

>  the following code generator
Oh, and FWIW we avoid code generation and POJOs, and instead rely on
Flink's Row or RowData abstractions.






On Sun, Feb 25, 2024 at 10:35 AM Andrew Otto <o...@wikimedia.org> wrote:

> Hi!
>
> I'm not sure if this totally is relevant for you, but we use JSONSchema
> and JSON with Flink at the Wikimedia Foundation.
> We explicitly disallow the use of additionalProperties
> <https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#No_object_additionalProperties>,
> unless it is to define Map type fields
> <https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#map_types>
> (where additionalProperties itself is a schema).
>
> We have JSONSchema converters and JSON Serdes to be able to use our
> JSONSchemas and JSON records with both the DataStream API (as Row) and
> Table API (as RowData).
>
> See:
> -
> https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json
> -
> https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object
>
> State schema evolution is supported via the EventRowTypeInfo wrapper
> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/EventRowTypeInfo.java#42>
> .
>
> Less directly about Flink: I gave a talk at Confluent's Current conf in
> 2022 about why we use JSONSchema
> <https://www.confluent.io/events/current-2022/wikipedias-event-data-platform-or-json-is-okay-too/>.
> See also this blog post series if you are interested
> <https://techblog.wikimedia.org/2020/09/10/wikimedias-event-data-platform-or-json-is-ok-too/>
> !
>
> -Andrew Otto
>  Wikimedia Foundation
>
>
> On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara <salcantara...@gmail.com>
> wrote:
>
>> I'm facing some issues related to schema evolution in combination with
>> the usage of Json Schemas and I was just wondering whether there are any
>> recommended best practices.
>>
>> In particular, I'm using the following code generator:
>>
>> - https://github.com/joelittlejohn/jsonschema2pojo
>>
>> Main gotchas so far relate to the `additionalProperties` field. When
>> setting that to true, the resulting POJO is not valid according to Flink
>> rules because the generated getter/setter methods don't follow the java
>> beans naming conventions, e.g., see here:
>>
>> - https://github.com/joelittlejohn/jsonschema2pojo/issues/1589
>>
>> This means that the Kryo fallback is used for serialization purposes,
>> which is not only bad for performance but also breaks state schema
>> evolution.
>>
>> So, because of that, setting `additionalProperties` to `false` looks like
>> a good idea but then your job will break if an upstream/producer service
>> adds a property to the messages you are reading. To solve this problem, the
>> POJOs for your job (as a reader) can be generated to ignore the
>> `additionalProperties` field (via the `@JsonIgnore` Jackson annotation).
>> This seems to be a good overall solution to the problem, but looks a bit
>> convoluted to me / didn't come without some trial & error (= pain &
>> frustration).
>>
>> Is there anyone here facing similar issues? It would be good to hear your
>> thoughts on this!
>>
>> BTW, this is very interesting article that touches on the above mentioned
>> difficulties:
>> -
>> https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html
>>
>>
>>

Re: Schema Evolution & Json Schemas

Reply via email to