Re: Schema Evolution & Json Schemas

Andrew Otto Sun, 25 Feb 2024 07:35:54 -0800

Hi!

I'm not sure if this totally is relevant for you, but we use JSONSchema and
JSON with Flink at the Wikimedia Foundation.
We explicitly disallow the use of additionalProperties
<https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#No_object_additionalProperties>,
unless it is to define Map type fields
<https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#map_types>
(where additionalProperties itself is a schema).

We have JSONSchema converters and JSON Serdes to be able to use our
JSONSchemas and JSON records with both the DataStream API (as Row) and
Table API (as RowData).

See:
-
https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json
-
https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object

State schema evolution is supported via the EventRowTypeInfo wrapper
<https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/EventRowTypeInfo.java#42>
.

Less directly about Flink: I gave a talk at Confluent's Current conf in
2022 about why we use JSONSchema
<https://www.confluent.io/events/current-2022/wikipedias-event-data-platform-or-json-is-okay-too/>.
See also this blog post series if you are interested
<https://techblog.wikimedia.org/2020/09/10/wikimedias-event-data-platform-or-json-is-ok-too/>
!

-Andrew Otto
 Wikimedia Foundation

On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara <[email protected]>
wrote:

> I'm facing some issues related to schema evolution in combination with the
> usage of Json Schemas and I was just wondering whether there are any
> recommended best practices.
>
> In particular, I'm using the following code generator:
>
> - https://github.com/joelittlejohn/jsonschema2pojo
>
> Main gotchas so far relate to the `additionalProperties` field. When
> setting that to true, the resulting POJO is not valid according to Flink
> rules because the generated getter/setter methods don't follow the java
> beans naming conventions, e.g., see here:
>
> - https://github.com/joelittlejohn/jsonschema2pojo/issues/1589
>
> This means that the Kryo fallback is used for serialization purposes,
> which is not only bad for performance but also breaks state schema
> evolution.
>
> So, because of that, setting `additionalProperties` to `false` looks like
> a good idea but then your job will break if an upstream/producer service
> adds a property to the messages you are reading. To solve this problem, the
> POJOs for your job (as a reader) can be generated to ignore the
> `additionalProperties` field (via the `@JsonIgnore` Jackson annotation).
> This seems to be a good overall solution to the problem, but looks a bit
> convoluted to me / didn't come without some trial & error (= pain &
> frustration).
>
> Is there anyone here facing similar issues? It would be good to hear your
> thoughts on this!
>
> BTW, this is very interesting article that touches on the above mentioned
> difficulties:
> -
> https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html
>
>
>

Re: Schema Evolution & Json Schemas

Reply via email to