Hi Rion Is your job datastream or table/sql? If it is a table/sql job, and you can define all the fields in json you need, then you can directly use json format [1] to parse the data.
You can also customize udf functions to parse json data into struct data, such as map, row and other types supported by flink [1] https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/ Best, Shammon FY On Sun, Mar 19, 2023 at 7:44 AM Rion Williams <rionmons...@gmail.com> wrote: > Hi all, > > I’m reaching out today for some suggestions (and hopefully a solution) for > a Flink job that I’m working on. The job itself reads JSON strings from a > Kafka topic and reads those into JSONObjects (currently via Gson), which > are then operated against, before ultimately being written out to Kafka > again. > > The problem here is that the shape of the data can vary wildly and > dynamically. Some records may have properties unique to only that record, > which makes defining a POJO difficult. In addition to this, the JSONObjects > fall by to Kryo serialization which is leading to atrocious throughput. > > I basically need to read in JSON strings, enrich properties on these > objects, and ultimately write them to various sinks. Is there some type of > JSON-based class or library or an approach I could use to accomplish this > in an efficient manner? Or if possibly a way to partially write a POJO that > would allow me to interact with sections/properties of the JSON while > retaining other properties that might be dynamically present or unique to > the message? > > Any advice or suggestions would be welcome! I’ll also be happy to provide > any additional context if it would help! > > Thanks, > > Rion > > (cross-posted to users+dev for reach)