[ https://issues.apache.org/jira/browse/FLINK-32115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
luoyuxia updated FLINK-32115: ----------------------------- Description: +underlined text+[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java] hive support json object cache for previous deserialized value, could we consider use a cache objects in JsonValueCallGen? This optimize can improve performance of SQL like select json_value(A, 'xxx'), json_value(A, 'yyy'), json_value(A, 'zzz'), ... a lot I added a static LRU cache into SqlJsonUtils, and refactor the jsonValueExpression1 like {code:java} private static JsonValueContext jsonValueExpression1(String input) { JsonValueContext parsedJsonContext = EXTRACT_OBJECT_CACHE.get(input); if (parsedJsonContext != null) { return parsedJsonContext; } try { parsedJsonContext = JsonValueContext.withJavaObj(dejsonize(input)); } catch (Exception e) { parsedJsonContext = JsonValueContext.withException(e); } EXTRACT_OBJECT_CACHE.put(input, parsedJsonContext); return parsedJsonContext; } {code} and benchmarked like: {code:java} public static void main(String[] args) { String input = "{\"social\":[{\"weibo\":\"https://weibo.com/xiaoming\"},{\"github\":\"https://github.com/xiaoming\"}]}"; Long start = System.currentTimeMillis(); for (int i = 0; i < 1000000; i++) { Object dejsonize = jsonValueExpression1(input); } System.err.println(System.currentTimeMillis() - start); } {code} time 2 benchmark takes is: ||case||milli second taken|| |cache|33| |no cache|1591| was: [https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java] hive support json object cache for previous deserialized value, could we consider use a cache objects in JsonValueCallGen? This optimize can improve performance of SQL like select json_value(A, 'xxx'), json_value(A, 'yyy'), json_value(A, 'zzz'), ... a lot I added a static LRU cache into SqlJsonUtils, and refactor the jsonValueExpression1 like {code:java} private static JsonValueContext jsonValueExpression1(String input) { JsonValueContext parsedJsonContext = EXTRACT_OBJECT_CACHE.get(input); if (parsedJsonContext != null) { return parsedJsonContext; } try { parsedJsonContext = JsonValueContext.withJavaObj(dejsonize(input)); } catch (Exception e) { parsedJsonContext = JsonValueContext.withException(e); } EXTRACT_OBJECT_CACHE.put(input, parsedJsonContext); return parsedJsonContext; } {code} and benchmarked like: {code:java} public static void main(String[] args) { String input = "{\"social\":[{\"weibo\":\"https://weibo.com/xiaoming\"},{\"github\":\"https://github.com/xiaoming\"}]}"; Long start = System.currentTimeMillis(); for (int i = 0; i < 1000000; i++) { Object dejsonize = jsonValueExpression1(input); } System.err.println(System.currentTimeMillis() - start); } {code} time 2 benchmark takes is: ||case||milli second taken|| |cache|33| |no cache|1591| > json_value support cache > ------------------------ > > Key: FLINK-32115 > URL: https://issues.apache.org/jira/browse/FLINK-32115 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Runtime > Affects Versions: 1.16.1 > Reporter: xiaogang zhou > Priority: Major > > +underlined > text+[https://github.com/apache/hive/blob/storage-branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java] > > hive support json object cache for previous deserialized value, could we > consider use a cache objects in JsonValueCallGen? > > This optimize can improve performance of SQL like > > select > json_value(A, 'xxx'), > json_value(A, 'yyy'), > json_value(A, 'zzz'), > ... > a lot > > I added a static LRU cache into SqlJsonUtils, and refactor the > jsonValueExpression1 like > {code:java} > private static JsonValueContext jsonValueExpression1(String input) { > JsonValueContext parsedJsonContext = EXTRACT_OBJECT_CACHE.get(input); > if (parsedJsonContext != null) { > return parsedJsonContext; > } > try { > parsedJsonContext = JsonValueContext.withJavaObj(dejsonize(input)); > } catch (Exception e) { > parsedJsonContext = JsonValueContext.withException(e); > } > EXTRACT_OBJECT_CACHE.put(input, parsedJsonContext); > return parsedJsonContext; > } {code} > > and benchmarked like: > {code:java} > public static void main(String[] args) { > String input = > "{\"social\":[{\"weibo\":\"https://weibo.com/xiaoming\"},{\"github\":\"https://github.com/xiaoming\"}]}"; > Long start = System.currentTimeMillis(); > for (int i = 0; i < 1000000; i++) { > Object dejsonize = jsonValueExpression1(input); > } > System.err.println(System.currentTimeMillis() - start); > } {code} > > time 2 benchmark takes is: > ||case||milli second taken|| > |cache|33| > |no cache|1591| > -- This message was sent by Atlassian Jira (v8.20.10#820010)