[jira] [Commented] (FLINK-19896) Support first-n-rows deduplication in the Deduplicate operator

Jark Wu (Jira) Sat, 31 Oct 2020 21:34:39 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-19896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224217#comment-17224217
 ]


Jark Wu commented on FLINK-19896:
---------------------------------

I agree this can be a nice improvement. But please do not put this logic into 
"deduplicate", because the corresponding physical node is Rank instead of 
Deduplicate and the existing "deduplicate" has already been complex. This can 
be an improvement on the {{AppendOnlyTopNFunction}} or a variant of 
{{AppendOnlyTopNFunction}}.

> Support first-n-rows deduplication in the Deduplicate operator
> --------------------------------------------------------------
>
>                 Key: FLINK-19896
>                 URL: https://issues.apache.org/jira/browse/FLINK-19896
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Planner, Table SQL / Runtime
>            Reporter: Jun Zhang
>            Priority: Major
>
> Currently Deduplicate operator only supports first-row deduplication (ordered 
> by proc-time). In scenario of first-n-rows deduplication, the planner has to 
> resort to Rank operator.  However, Rank operator is less efficient than 
> Deduplicate due to larger state and more state access.
> This issue proposes to extend DeduplicateKeepFirstRowFunction to support 
> first-n-rows deduplication. And the original first-row deduplication would be 
> a special case of first-n-rows deduplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19896) Support first-n-rows deduplication in the Deduplicate operator

Reply via email to