[ https://issues.apache.org/jira/browse/FLINK-31240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17726219#comment-17726219 ]
Dong Lin commented on FLINK-31240: ---------------------------------- Merged to master branch 6b6df3db466d6a030d5a38ec786ac3297cb41c38. > Reduce the overhead of conversion between DataStream and Table > -------------------------------------------------------------- > > Key: FLINK-31240 > URL: https://issues.apache.org/jira/browse/FLINK-31240 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API > Reporter: Jiang Xin > Assignee: Yunfeng Zhou > Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > In some cases, users may need to convert the underlying DataStream to Table > and then convert it back to DataStream(e.g. some Flink ML libraries accept a > Table as input and convert it to DataStream for calculation.). This would > cause unnecessary overhead because of data conversion between the internal > data type and the external data type. > We can reduce the overhead by checking if there are paired > `fromDataStream`/`toDataStream` function call without any transformation, if > so using the source datastream directly. > The performance of Flink ML's Bucketizer algorithm[1] is used to demonstrate > the impact of this optimization. The execution time is obtained by taking the > median execution time across 5 runs for each setup. > Before optimization: 40746ms > After optimization: 12972ms > Thus this optimization reduces the total execution time of Flink ML's > Bucketizer algorithm to about 1/3. > [1] > https://github.com/apache/flink-ml/blob/master/flink-ml-benchmark/src/main/resources/bucketizer-benchmark.json -- This message was sent by Atlassian Jira (v8.20.10#820010)