Hi Rion, I’m using Gson to deserialize to a Map<String, JsonElement>.
1-2 records/second sounds way too slow, unless each record is enormous. — Ken > On Mar 21, 2023, at 6:18 AM, Rion Williams <rionmons...@gmail.com> wrote: > > Hi Ken, > > Thanks for the response. I hadn't tried exploring the use of the Record > class, which I'm assuming you're referring to a flink.types.Record, to read > the JSON into. Did you handle this via using a mapper to read the properties > in (e.g. Gson, Jackson) as fields or take a different approach? Additionally, > how has your experience been with performance? Kryo with the existing job > leveraging JsonObjects (via Gson) is horrific (~1-2 records/second) and can't > keep up with the speed of the producers, which is the impetus behind > reevaluating the serialization. > > I'll explore this a bit more. > > Thanks, > > Rion > > On Mon, Mar 20, 2023 at 10:28 PM Ken Krugler <kkrugler_li...@transpac.com > <mailto:kkrugler_li...@transpac.com>> wrote: > Hi Rion, > > For my similar use case, I was able to make a simplifying assumption that my > top-level JSON object was a record. > > I then registered a custom Kryo serde that knew how to handle the handful of > JsonPrimitive types for the record entries. > > I recently looked at extending that to support arrays and nested records, but > haven’t had to do that. > > — Ken > > >> On Mar 20, 2023, at 6:56 PM, Rion Williams <rionmons...@gmail.com >> <mailto:rionmons...@gmail.com>> wrote: >> >> Hi Shammon, >> >> Unfortunately it’s a data stream job. I’ve been exploring a few options but >> haven’t found anything I’ve decided on yet. I’m currently looking at seeing >> if I can leverage some type of partial serialization to bind to the >> properties that I know the job will use and retain the rest as a JSON blob. >> I’ve also consider trying to store the fields as a large map of >> string-object pairs and translating thay into a string prior to writing to >> the sinks. >> >> Still accepting any/all ideas that I come across to see if I can handle this >> in an efficient, reasonable way. >> >> Thanks, >> >> Rion >> >>> On Mar 20, 2023, at 8:40 PM, Shammon FY <zjur...@gmail.com >>> <mailto:zjur...@gmail.com>> wrote: >>> >>> >>> Hi Rion >>> >>> Is your job datastream or table/sql? If it is a table/sql job, and you can >>> define all the fields in json you need, then you can directly use json >>> format [1] to parse the data. >>> >>> You can also customize udf functions to parse json data into struct data, >>> such as map, row and other types supported by flink >>> >>> >>> [1] >>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/ >>> >>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/> >>> >>> Best, >>> Shammon FY >>> >>> >>> On Sun, Mar 19, 2023 at 7:44 AM Rion Williams <rionmons...@gmail.com >>> <mailto:rionmons...@gmail.com>> wrote: >>> Hi all, >>> >>> I’m reaching out today for some suggestions (and hopefully a solution) for >>> a Flink job that I’m working on. The job itself reads JSON strings from a >>> Kafka topic and reads those into JSONObjects (currently via Gson), which >>> are then operated against, before ultimately being written out to Kafka >>> again. >>> >>> The problem here is that the shape of the data can vary wildly and >>> dynamically. Some records may have properties unique to only that record, >>> which makes defining a POJO difficult. In addition to this, the JSONObjects >>> fall by to Kryo serialization which is leading to atrocious throughput. >>> >>> I basically need to read in JSON strings, enrich properties on these >>> objects, and ultimately write them to various sinks. Is there some type of >>> JSON-based class or library or an approach I could use to accomplish this >>> in an efficient manner? Or if possibly a way to partially write a POJO that >>> would allow me to interact with sections/properties of the JSON while >>> retaining other properties that might be dynamically present or unique to >>> the message? >>> >>> Any advice or suggestions would be welcome! I’ll also be happy to provide >>> any additional context if it would help! >>> >>> Thanks, >>> >>> Rion >>> >>> (cross-posted to users+dev for reach) > > -------------------------- > Ken Krugler > http://www.scaleunlimited.com <http://www.scaleunlimited.com/> > Custom big data solutions > Flink, Pinot, Solr, Elasticsearch > > > -------------------------- Ken Krugler http://www.scaleunlimited.com Custom big data solutions Flink, Pinot, Solr, Elasticsearch