[ 
https://issues.apache.org/jira/browse/FLINK-21203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276066#comment-17276066
 ] 

Jark Wu commented on FLINK-21203:
---------------------------------

I think we may not need the configuration. Group Aggregate also has the similar 
logic when state ttl is not enabled. 
We may need to enable this optimization only when state ttl is disabled, just 
like the implementation in 
https://github.com/apache/flink/blob/f3db4220f5c8730e065734cff16237c7743b390f/flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/operators/aggregate/GroupAggFunction.java#L170

>  Don’t collect -U&+U Row When they are equals In the LastRowFunction 
> ---------------------------------------------------------------------
>
>                 Key: FLINK-21203
>                 URL: https://issues.apache.org/jira/browse/FLINK-21203
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Runtime
>            Reporter: wangpeibin
>            Assignee: wangpeibin
>            Priority: Major
>
> In the  LastRowFunction , the -U&+U Row will be collected even if they are 
> the same, which will  increase calculation pressure of the next Operator.
>  
> To avoid this, we can optimize the logic of DeduplicateFunctionHelper. Also, 
> a config to enable the optimization will be added.
> With the sql followed:
> {quote}select * from
>  (select
>  *,
>  row_number() over (partition by k order by proctime() desc ) as row_num
>  from a
>  ) t
>  where row_num = 1
> {quote}
> Then input 2 row such as :
> {quote}Event("B","1","b"),
>  Event("B","1","b")
> {quote}
> Now the output is:
> {quote}(true,+I[B, 1, b, 1])
>  (false,-U[B, 1, b, 1])
>  (true,+U[B, 1, b, 1])
> {quote}
> After the optimization, the output will be:
> {quote}(true,+I[B, 1, b, 1])
> {quote}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to