Re: Schema Evolution & Json Schemas

Salva Alcántara Mon, 26 Feb 2024 04:56:55 -0800

Awesome Andrew, thanks a lot for the info!

On Sun, Feb 25, 2024 at 4:37 PM Andrew Otto <o...@wikimedia.org> wrote:


> >  the following code generator
> Oh, and FWIW we avoid code generation and POJOs, and instead rely on
> Flink's Row or RowData abstractions.
>
>
>
>
>
> On Sun, Feb 25, 2024 at 10:35 AM Andrew Otto <o...@wikimedia.org> wrote:
>
>> Hi!
>>
>> I'm not sure if this totally is relevant for you, but we use JSONSchema
>> and JSON with Flink at the Wikimedia Foundation.
>> We explicitly disallow the use of additionalProperties
>> <https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#No_object_additionalProperties>,
>> unless it is to define Map type fields
>> <https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#map_types>
>> (where additionalProperties itself is a schema).
>>
>> We have JSONSchema converters and JSON Serdes to be able to use our
>> JSONSchemas and JSON records with both the DataStream API (as Row) and
>> Table API (as RowData).
>>
>> See:
>> -
>> https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json
>> -
>> https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object
>>
>> State schema evolution is supported via the EventRowTypeInfo wrapper
>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/EventRowTypeInfo.java#42>
>> .
>>
>> Less directly about Flink: I gave a talk at Confluent's Current conf in
>> 2022 about why we use JSONSchema
>> <https://www.confluent.io/events/current-2022/wikipedias-event-data-platform-or-json-is-okay-too/>.
>> See also this blog post series if you are interested
>> <https://techblog.wikimedia.org/2020/09/10/wikimedias-event-data-platform-or-json-is-ok-too/>
>> !
>>
>> -Andrew Otto
>>  Wikimedia Foundation
>>
>>
>> On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara <salcantara...@gmail.com>
>> wrote:
>>
>>> I'm facing some issues related to schema evolution in combination with
>>> the usage of Json Schemas and I was just wondering whether there are any
>>> recommended best practices.
>>>
>>> In particular, I'm using the following code generator:
>>>
>>> - https://github.com/joelittlejohn/jsonschema2pojo
>>>
>>> Main gotchas so far relate to the `additionalProperties` field. When
>>> setting that to true, the resulting POJO is not valid according to Flink
>>> rules because the generated getter/setter methods don't follow the java
>>> beans naming conventions, e.g., see here:
>>>
>>> - https://github.com/joelittlejohn/jsonschema2pojo/issues/1589
>>>
>>> This means that the Kryo fallback is used for serialization purposes,
>>> which is not only bad for performance but also breaks state schema
>>> evolution.
>>>
>>> So, because of that, setting `additionalProperties` to `false` looks
>>> like a good idea but then your job will break if an upstream/producer
>>> service adds a property to the messages you are reading. To solve this
>>> problem, the POJOs for your job (as a reader) can be generated to ignore
>>> the `additionalProperties` field (via the `@JsonIgnore` Jackson
>>> annotation). This seems to be a good overall solution to the problem, but
>>> looks a bit convoluted to me / didn't come without some trial & error (=
>>> pain & frustration).
>>>
>>> Is there anyone here facing similar issues? It would be good to hear
>>> your thoughts on this!
>>>
>>> BTW, this is very interesting article that touches on the above
>>> mentioned difficulties:
>>> -
>>> https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html
>>>
>>>
>>>

Re: Schema Evolution & Json Schemas

Reply via email to