Hi Shammon,

Unfortunately it’s a data stream job. I’ve been exploring a few options but haven’t found anything I’ve decided on yet. I’m currently looking at seeing if I can leverage some type of partial serialization to bind to the properties that I know the job will use and retain the rest as a JSON blob. I’ve also consider trying to store the fields as a large map of string-object pairs and translating thay into a string prior to writing to the sinks.

Still accepting any/all ideas that I come across to see if I can handle this in an efficient, reasonable way.

Thanks,

Rion

On Mar 20, 2023, at 8:40 PM, Shammon FY <zjur...@gmail.com> wrote:


Hi Rion

Is your job datastream or table/sql? If it is a table/sql job, and you can define all the fields in json you need, then you can directly use json format [1] to parse the data.

You can also customize udf functions to parse json data into struct data, such as map, row and other types supported by flink



Best,
Shammon FY


On Sun, Mar 19, 2023 at 7:44 AM Rion Williams <rionmons...@gmail.com> wrote:
Hi all,

I’m reaching out today for some suggestions (and hopefully a solution) for a Flink job that I’m working on. The job itself reads JSON strings from a Kafka topic and reads those into JSONObjects (currently via Gson), which are then operated against, before ultimately being written out to Kafka again.

The problem here is that the shape of the data can vary wildly and dynamically. Some records may have properties unique to only that record, which makes defining a POJO difficult. In addition to this, the JSONObjects fall by to Kryo serialization which is leading to atrocious throughput.

I basically need to read in JSON strings, enrich properties on these objects, and ultimately write them to various sinks.  Is there some type of JSON-based class or library or an approach I could use to accomplish this in an efficient manner? Or if possibly a way to partially write a POJO that would allow me to interact with sections/properties of the JSON while retaining other properties that might be dynamically present or unique to the message?

Any advice or suggestions would be welcome! I’ll also be happy to provide any additional context if it would help!

Thanks,

Rion

(cross-posted to users+dev for reach)

Reply via email to