[ https://issues.apache.org/jira/browse/FLINK-21203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278643#comment-17278643 ]
wangpeibin commented on FLINK-21203: ------------------------------------ hi [~jark] & [~Leonard Xu] , Could you have a Code Review with this pr [GitHub Pull Request #14863|https://github.com/apache/flink/pull/14863] > Don’t collect -U&+U Row When they are equals In the LastRowFunction > --------------------------------------------------------------------- > > Key: FLINK-21203 > URL: https://issues.apache.org/jira/browse/FLINK-21203 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Runtime > Reporter: wangpeibin > Assignee: wangpeibin > Priority: Major > Labels: pull-request-available > > In the LastRowFunction , the -U&+U Row will be collected even if they are > the same, which will increase calculation pressure of the next Operator. > > To avoid this, we can optimize the logic of DeduplicateFunctionHelper. Also, > a config to enable the optimization will be added. > With the sql followed: > {quote}select * from > (select > *, > row_number() over (partition by k order by proctime() desc ) as row_num > from a > ) t > where row_num = 1 > {quote} > Then input 2 row such as : > {quote}Event("B","1","b"), > Event("B","1","b") > {quote} > Now the output is: > {quote}(true,+I[B, 1, b, 1]) > (false,-U[B, 1, b, 1]) > (true,+U[B, 1, b, 1]) > {quote} > After the optimization, the output will be: > {quote}(true,+I[B, 1, b, 1]) > {quote} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)