Awesome Andrew, thanks a lot for the info! On Sun, Feb 25, 2024 at 4:37 PM Andrew Otto <o...@wikimedia.org> wrote:
> > the following code generator > Oh, and FWIW we avoid code generation and POJOs, and instead rely on > Flink's Row or RowData abstractions. > > > > > > On Sun, Feb 25, 2024 at 10:35 AM Andrew Otto <o...@wikimedia.org> wrote: > >> Hi! >> >> I'm not sure if this totally is relevant for you, but we use JSONSchema >> and JSON with Flink at the Wikimedia Foundation. >> We explicitly disallow the use of additionalProperties >> <https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#No_object_additionalProperties>, >> unless it is to define Map type fields >> <https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#map_types> >> (where additionalProperties itself is a schema). >> >> We have JSONSchema converters and JSON Serdes to be able to use our >> JSONSchemas and JSON records with both the DataStream API (as Row) and >> Table API (as RowData). >> >> See: >> - >> https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json >> - >> https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object >> >> State schema evolution is supported via the EventRowTypeInfo wrapper >> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/EventRowTypeInfo.java#42> >> . >> >> Less directly about Flink: I gave a talk at Confluent's Current conf in >> 2022 about why we use JSONSchema >> <https://www.confluent.io/events/current-2022/wikipedias-event-data-platform-or-json-is-okay-too/>. >> See also this blog post series if you are interested >> <https://techblog.wikimedia.org/2020/09/10/wikimedias-event-data-platform-or-json-is-ok-too/> >> ! >> >> -Andrew Otto >> Wikimedia Foundation >> >> >> On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara <salcantara...@gmail.com> >> wrote: >> >>> I'm facing some issues related to schema evolution in combination with >>> the usage of Json Schemas and I was just wondering whether there are any >>> recommended best practices. >>> >>> In particular, I'm using the following code generator: >>> >>> - https://github.com/joelittlejohn/jsonschema2pojo >>> >>> Main gotchas so far relate to the `additionalProperties` field. When >>> setting that to true, the resulting POJO is not valid according to Flink >>> rules because the generated getter/setter methods don't follow the java >>> beans naming conventions, e.g., see here: >>> >>> - https://github.com/joelittlejohn/jsonschema2pojo/issues/1589 >>> >>> This means that the Kryo fallback is used for serialization purposes, >>> which is not only bad for performance but also breaks state schema >>> evolution. >>> >>> So, because of that, setting `additionalProperties` to `false` looks >>> like a good idea but then your job will break if an upstream/producer >>> service adds a property to the messages you are reading. To solve this >>> problem, the POJOs for your job (as a reader) can be generated to ignore >>> the `additionalProperties` field (via the `@JsonIgnore` Jackson >>> annotation). This seems to be a good overall solution to the problem, but >>> looks a bit convoluted to me / didn't come without some trial & error (= >>> pain & frustration). >>> >>> Is there anyone here facing similar issues? It would be good to hear >>> your thoughts on this! >>> >>> BTW, this is very interesting article that touches on the above >>> mentioned difficulties: >>> - >>> https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html >>> >>> >>>