[ https://issues.apache.org/jira/browse/FLINK-19896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jun Zhang updated FLINK-19896: ------------------------------ Description: Currently Deduplicate operator only supports first-row deduplication (ordered by proc-time). In scenario of first-n-rows deduplication, the planner has to resort to Rank operator. However, Rank operator is less efficient than Deduplicate due to larger state and more state access. This issue proposes to extend DeduplicateKeepFirstRowFunction to support first-n-rows deduplication. And the original first-row deduplication would be a special case of first-n-rows deduplication. was: Currently Deduplicate operator only supports first-row deduplication (ordered by proc-time). In scenario of first-n-rows deduplication, the planner has to resort to Rank operator. However, Rank operator is less efficient than Deduplicate due to more state access. This issue proposes to extend DeduplicateKeepFirstRowFunction to support first-n-rows deduplication. And the original first-row deduplication would be a special case of first-n-rows deduplication. > Support first-n-rows deduplication in the Deduplicate operator > -------------------------------------------------------------- > > Key: FLINK-19896 > URL: https://issues.apache.org/jira/browse/FLINK-19896 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner, Table SQL / Runtime > Affects Versions: 1.12.0, 1.11.3 > Reporter: Jun Zhang > Priority: Major > Fix For: 1.11.2 > > > Currently Deduplicate operator only supports first-row deduplication (ordered > by proc-time). In scenario of first-n-rows deduplication, the planner has to > resort to Rank operator. However, Rank operator is less efficient than > Deduplicate due to larger state and more state access. > This issue proposes to extend DeduplicateKeepFirstRowFunction to support > first-n-rows deduplication. And the original first-row deduplication would be > a special case of first-n-rows deduplication. -- This message was sent by Atlassian Jira (v8.3.4#803005)