> the following code generator Oh, and FWIW we avoid code generation and POJOs, and instead rely on Flink's Row or RowData abstractions.
On Sun, Feb 25, 2024 at 10:35 AM Andrew Otto <o...@wikimedia.org> wrote: > Hi! > > I'm not sure if this totally is relevant for you, but we use JSONSchema > and JSON with Flink at the Wikimedia Foundation. > We explicitly disallow the use of additionalProperties > <https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#No_object_additionalProperties>, > unless it is to define Map type fields > <https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#map_types> > (where additionalProperties itself is a schema). > > We have JSONSchema converters and JSON Serdes to be able to use our > JSONSchemas and JSON records with both the DataStream API (as Row) and > Table API (as RowData). > > See: > - > https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json > - > https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object > > State schema evolution is supported via the EventRowTypeInfo wrapper > <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/EventRowTypeInfo.java#42> > . > > Less directly about Flink: I gave a talk at Confluent's Current conf in > 2022 about why we use JSONSchema > <https://www.confluent.io/events/current-2022/wikipedias-event-data-platform-or-json-is-okay-too/>. > See also this blog post series if you are interested > <https://techblog.wikimedia.org/2020/09/10/wikimedias-event-data-platform-or-json-is-ok-too/> > ! > > -Andrew Otto > Wikimedia Foundation > > > On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara <salcantara...@gmail.com> > wrote: > >> I'm facing some issues related to schema evolution in combination with >> the usage of Json Schemas and I was just wondering whether there are any >> recommended best practices. >> >> In particular, I'm using the following code generator: >> >> - https://github.com/joelittlejohn/jsonschema2pojo >> >> Main gotchas so far relate to the `additionalProperties` field. When >> setting that to true, the resulting POJO is not valid according to Flink >> rules because the generated getter/setter methods don't follow the java >> beans naming conventions, e.g., see here: >> >> - https://github.com/joelittlejohn/jsonschema2pojo/issues/1589 >> >> This means that the Kryo fallback is used for serialization purposes, >> which is not only bad for performance but also breaks state schema >> evolution. >> >> So, because of that, setting `additionalProperties` to `false` looks like >> a good idea but then your job will break if an upstream/producer service >> adds a property to the messages you are reading. To solve this problem, the >> POJOs for your job (as a reader) can be generated to ignore the >> `additionalProperties` field (via the `@JsonIgnore` Jackson annotation). >> This seems to be a good overall solution to the problem, but looks a bit >> convoluted to me / didn't come without some trial & error (= pain & >> frustration). >> >> Is there anyone here facing similar issues? It would be good to hear your >> thoughts on this! >> >> BTW, this is very interesting article that touches on the above mentioned >> difficulties: >> - >> https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html >> >> >>