Hi Yaroslav, Thanks for your reply. How is performance ,
Can I have a sample code for the same. On Thu, Jul 21, 2022, 9:31 PM Yaroslav Tkachenko <yaros...@goldsky.io> wrote: > Hi Soumen, > > I'd try parsing the input using the DataStream API (with a fast JSON > library) and then converting it to a Table. > > On Thu, Jul 21, 2022 at 6:22 AM Soumen Choudhury <sou....@gmail.com> > wrote: > >> We have a requirement of parsing a very complex json (size around 25 kb >> per event) event with a predefined schema (nested schema, with multiple >> schema files ) and create a temporary table and from temp table we have to >> apply some case statement based some fields( eg. to find out success, >> failure count , status code ) and do a aggregation in 1 sec interval. >> >> We have tried with inbuilt *JSON_VALUE* function to retrieve some field >> value and then apply the case statement, but as I am using JSON_VALUE more >> than 5/6 times, the application is performing very slow. >> >> For some other filtering use case we are able to receive more that 1600 >> event/sec, but for this case we are only receiving around 300 event/sec for >> 1 core . >> >> Below is the query example: >> >> *Query 1:* >> >> "select cast(JSON_QUERY(message, '$.eventRecordHeader.Result'), bigint) >> AS result1, JSON_QUERY(message, '$.eventRecordHeader.Cause.ErrorCode' ) >> AS errorCode, JSON_QUERY(message, '$.eventRecordHeader.Cause.SubCause' ) AS >> subCause, JSON_QUERY(message, >> '$.eventRecordHeader.Cause.SubCause.SubProtocol' ) AS subProtocol, >> JSON_QUERY(message, '$.eventRecordHeader.Cause.SubCause.SubError' ) AS >> subError, TO_TIMESTAMP_LTZ(cast(JSON_QUERY(message, >> '$.eventRecordHeader.StartTime') as bigint)/1000, 3) AS eventTime, >> proctime() as proctime from kafkaJsonSource", >> >> *Query 2:* >> >> select count(case when result1=1 then 1 else null end) >> failed_result,count(case when result1=0 then 1 else null end) >> successful_result,count(case when errorCode like '4%' then 1 else null end) >> err_starts_4,count(case when errorCode like '5%' then 1 else null end) >> err_starts_5,count(case when errorCode like '6%' then 1 else null end) >> err_starts_6,count(case when subCause is not null then 1 else null end) >> has_sub_cause,count(case when subProtocol='DNS' then 1 else null end) >> protocol_dns, count(case when subProtocol='Diameter' then 1 else null end) >> protocol_diameter, count(case when (subProtocol='Diameter' and subError >> like '3%') then 1 else null end) protocol_diameter_err_starts_3,count(case >> when (subProtocol='Diameter' and subError like '4%') then 1 else null end) >> protocol_diameter_err_starts_4,count(case when (subProtocol='Diameter' and >> subError like '5%') then 1 else null end) protocol_diameter_err_starts_5 >> FROM TABLE(TUMBLE(TABLE filter_transformed, DESCRIPTOR(proctime), INTERVAL >> '1' SECOND)) GROUP BY window_start, window_end; >> >> Please someone let use know, if there is some better way to do this . >> >> -- >> Regards >> Soumen Choudhury >> Cell : +91865316168 >> mail to : sou....@gmail.com >> >