[ https://issues.apache.org/jira/browse/FLINK-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969271#comment-16969271 ]
Jiayi Liao commented on FLINK-14608: ------------------------------------ [~dwysakowicz] Thanks for pointing it out. I should've provided this before the change. I run a simple performance benchmark. By iterating 1,000,000 times on different fixed 100-length arrays, the performance comparison has a difference of more than 10%. The code is showed below: {code:java} List<List<String>> arrayList = generateTestString(Integer.parseInt(args[1]), Integer.parseInt(args[2])); if (mode.equals("stream")) { for (List<String> testArray : arrayList) { stream(spliterator(testArray.iterator(), testArray.size(), 0), false) .map(s -> s.substring(3, s.length() - 2)) .toArray(); } } else { for (List<String> testArray : arrayList) { final Object[] array = (Object[]) Array.newInstance(String.class, testArray.size()); for (int i = 0; i < testArray.size(); i++) { final String s = testArray.get(i); array[i] = s.substring(3, s.length() - 2); } } } {code} > avoid using Java Streams in JsonRowDeserializationSchema > -------------------------------------------------------- > > Key: FLINK-14608 > URL: https://issues.apache.org/jira/browse/FLINK-14608 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) > Affects Versions: 1.10.0 > Reporter: Kurt Young > Assignee: Jiayi Liao > Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > According to > [https://flink.apache.org/contributing/code-style-and-quality-java.html], we > should avoid using Java Streams in any performance critical code. Since this > `DeserializationRuntimeConverter` will be called per field of each coming > record, we should provide a non Java Streams implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)