Re: Handling JSON Serialization without Kryo

Ken Krugler Tue, 21 Mar 2023 12:37:10 -0700

Hi Rion,

I’m using Gson to deserialize to a Map<String, JsonElement>.


1-2 records/second sounds way too slow, unless each record is enormous.

— Ken

> On Mar 21, 2023, at 6:18 AM, Rion Williams <rionmons...@gmail.com> wrote:
> 
> Hi Ken,
> 
> Thanks for the response. I hadn't tried exploring the use of the Record 
> class, which I'm assuming you're referring to a flink.types.Record, to read 
> the JSON into. Did you handle this via using a mapper to read the properties 
> in (e.g. Gson, Jackson) as fields or take a different approach? Additionally, 
> how has your experience been with performance? Kryo with the existing job 
> leveraging JsonObjects (via Gson) is horrific (~1-2 records/second) and can't 
> keep up with the speed of the producers, which is the impetus behind 
> reevaluating the serialization.
> 
> I'll explore this a bit more.
> 
> Thanks,
> 
> Rion
> 
> On Mon, Mar 20, 2023 at 10:28 PM Ken Krugler <kkrugler_li...@transpac.com 
> <mailto:kkrugler_li...@transpac.com>> wrote:
> Hi Rion,
> 
> For my similar use case, I was able to make a simplifying assumption that my 
> top-level JSON object was a record.
> 
> I then registered a custom Kryo serde that knew how to handle the handful of 
> JsonPrimitive types for the record entries.
> 
> I recently looked at extending that to support arrays and nested records, but 
> haven’t had to do that.
> 
> — Ken
> 
> 
>> On Mar 20, 2023, at 6:56 PM, Rion Williams <rionmons...@gmail.com 
>> <mailto:rionmons...@gmail.com>> wrote:
>> 
>> Hi Shammon,
>> 
>> Unfortunately it’s a data stream job. I’ve been exploring a few options but 
>> haven’t found anything I’ve decided on yet. I’m currently looking at seeing 
>> if I can leverage some type of partial serialization to bind to the 
>> properties that I know the job will use and retain the rest as a JSON blob. 
>> I’ve also consider trying to store the fields as a large map of 
>> string-object pairs and translating thay into a string prior to writing to 
>> the sinks.
>> 
>> Still accepting any/all ideas that I come across to see if I can handle this 
>> in an efficient, reasonable way.
>> 
>> Thanks,
>> 
>> Rion
>> 
>>> On Mar 20, 2023, at 8:40 PM, Shammon FY <zjur...@gmail.com 
>>> <mailto:zjur...@gmail.com>> wrote:
>>> 
>>> 
>>> Hi Rion
>>> 
>>> Is your job datastream or table/sql? If it is a table/sql job, and you can 
>>> define all the fields in json you need, then you can directly use json 
>>> format [1] to parse the data. 
>>> 
>>> You can also customize udf functions to parse json data into struct data, 
>>> such as map, row and other types supported by flink
>>> 
>>> 
>>> [1] 
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/
>>>  
>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/>
>>> 
>>> Best,
>>> Shammon FY
>>> 
>>> 
>>> On Sun, Mar 19, 2023 at 7:44 AM Rion Williams <rionmons...@gmail.com 
>>> <mailto:rionmons...@gmail.com>> wrote:
>>> Hi all,
>>> 
>>> I’m reaching out today for some suggestions (and hopefully a solution) for 
>>> a Flink job that I’m working on. The job itself reads JSON strings from a 
>>> Kafka topic and reads those into JSONObjects (currently via Gson), which 
>>> are then operated against, before ultimately being written out to Kafka 
>>> again.
>>> 
>>> The problem here is that the shape of the data can vary wildly and 
>>> dynamically. Some records may have properties unique to only that record, 
>>> which makes defining a POJO difficult. In addition to this, the JSONObjects 
>>> fall by to Kryo serialization which is leading to atrocious throughput.
>>> 
>>> I basically need to read in JSON strings, enrich properties on these 
>>> objects, and ultimately write them to various sinks.  Is there some type of 
>>> JSON-based class or library or an approach I could use to accomplish this 
>>> in an efficient manner? Or if possibly a way to partially write a POJO that 
>>> would allow me to interact with sections/properties of the JSON while 
>>> retaining other properties that might be dynamically present or unique to 
>>> the message?
>>> 
>>> Any advice or suggestions would be welcome! I’ll also be happy to provide 
>>> any additional context if it would help!
>>> 
>>> Thanks,
>>> 
>>> Rion
>>> 
>>> (cross-posted to users+dev for reach)
> 
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com <http://www.scaleunlimited.com/>
> Custom big data solutions
> Flink, Pinot, Solr, Elasticsearch
> 
> 
> 

--------------------------
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch

Re: Handling JSON Serialization without Kryo

Reply via email to