[jira] [Updated] (FLINK-32115) json_value support cache

luoyuxia (Jira) Mon, 22 May 2023 19:35:05 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-32115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


luoyuxia updated FLINK-32115:
-----------------------------
    Description: 
+underlined 
text+[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]

 

hive support json object cache for previous deserialized value, could we 
consider use a cache objects in JsonValueCallGen? 

 

This optimize can improve performance of SQL like

 

select 

json_value(A, 'xxx'),

json_value(A, 'yyy'),

json_value(A, 'zzz'),

...

a lot

 

I added a static LRU cache into SqlJsonUtils, and refactor the 
jsonValueExpression1 like 
{code:java}
private static JsonValueContext jsonValueExpression1(String input) {
    JsonValueContext parsedJsonContext = EXTRACT_OBJECT_CACHE.get(input);
    if (parsedJsonContext != null) {
        return parsedJsonContext;
    }
    try {
        parsedJsonContext = JsonValueContext.withJavaObj(dejsonize(input));
    } catch (Exception e) {
        parsedJsonContext = JsonValueContext.withException(e);
    }

    EXTRACT_OBJECT_CACHE.put(input, parsedJsonContext);
    return parsedJsonContext;
} {code}
 

and benchmarked like:
{code:java}
public static void main(String[] args) {
String input = 
"{\"social\":[{\"weibo\":\"https://weibo.com/xiaoming\"},{\"github\":\"https://github.com/xiaoming\"}]}";;

Long start = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
Object dejsonize = jsonValueExpression1(input);
}
System.err.println(System.currentTimeMillis() - start);

} {code}
 

time 2 benchmark takes is:
||case||milli second taken||
|cache|33|
|no cache|1591|

 

  was:
[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]

 

hive support json object cache for previous deserialized value, could we 
consider use a cache objects in JsonValueCallGen? 

 

This optimize can improve performance of SQL like

 

select 

json_value(A, 'xxx'),

json_value(A, 'yyy'),

json_value(A, 'zzz'),

...

a lot

 

I added a static LRU cache into SqlJsonUtils, and refactor the 
jsonValueExpression1 like 
{code:java}
private static JsonValueContext jsonValueExpression1(String input) {
    JsonValueContext parsedJsonContext = EXTRACT_OBJECT_CACHE.get(input);
    if (parsedJsonContext != null) {
        return parsedJsonContext;
    }
    try {
        parsedJsonContext = JsonValueContext.withJavaObj(dejsonize(input));
    } catch (Exception e) {
        parsedJsonContext = JsonValueContext.withException(e);
    }

    EXTRACT_OBJECT_CACHE.put(input, parsedJsonContext);
    return parsedJsonContext;
} {code}
 

and benchmarked like:
{code:java}
public static void main(String[] args) {
String input = 
"{\"social\":[{\"weibo\":\"https://weibo.com/xiaoming\"},{\"github\":\"https://github.com/xiaoming\"}]}";;

Long start = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
Object dejsonize = jsonValueExpression1(input);
}
System.err.println(System.currentTimeMillis() - start);

} {code}
 

time 2 benchmark takes is:
||case||milli second taken||
|cache|33|
|no cache|1591|

 


> json_value support cache
> ------------------------
>
>                 Key: FLINK-32115
>                 URL: https://issues.apache.org/jira/browse/FLINK-32115
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Runtime
>    Affects Versions: 1.16.1
>            Reporter: xiaogang zhou
>            Priority: Major
>
> +underlined 
> text+[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java]
>  
> hive support json object cache for previous deserialized value, could we 
> consider use a cache objects in JsonValueCallGen? 
>  
> This optimize can improve performance of SQL like
>  
> select 
> json_value(A, 'xxx'),
> json_value(A, 'yyy'),
> json_value(A, 'zzz'),
> ...
> a lot
>  
> I added a static LRU cache into SqlJsonUtils, and refactor the 
> jsonValueExpression1 like 
> {code:java}
> private static JsonValueContext jsonValueExpression1(String input) {
>     JsonValueContext parsedJsonContext = EXTRACT_OBJECT_CACHE.get(input);
>     if (parsedJsonContext != null) {
>         return parsedJsonContext;
>     }
>     try {
>         parsedJsonContext = JsonValueContext.withJavaObj(dejsonize(input));
>     } catch (Exception e) {
>         parsedJsonContext = JsonValueContext.withException(e);
>     }
>     EXTRACT_OBJECT_CACHE.put(input, parsedJsonContext);
>     return parsedJsonContext;
> } {code}
>  
> and benchmarked like:
> {code:java}
> public static void main(String[] args) {
> String input = 
> "{\"social\":[{\"weibo\":\"https://weibo.com/xiaoming\"},{\"github\":\"https://github.com/xiaoming\"}]}";;
> Long start = System.currentTimeMillis();
> for (int i = 0; i < 1000000; i++) {
> Object dejsonize = jsonValueExpression1(input);
> }
> System.err.println(System.currentTimeMillis() - start);
> } {code}
>  
> time 2 benchmark takes is:
> ||case||milli second taken||
> |cache|33|
> |no cache|1591|
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-32115) json_value support cache

Reply via email to