[jira] [Commented] (FLINK-12173) Optimize "SELECT DISTINCT" into Deduplicate with keep first row

lincoln lee (Jira) Tue, 17 Dec 2024 18:55:28 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-12173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906590#comment-17906590
 ]


lincoln lee commented on FLINK-12173:
-------------------------------------

[~jhughes] For performance testing, the harness test can be a complement 
(similar to the benchmark for operator itself), but there is also a need for 
integrated performance tests, like the tpc test[1] for batch scenarios and the 
nexmark test for streaming scenarios.
Current cases in Nexmark don't hit this scenario, so there's a need to extend 
the benchmark query (or/and extend the test data) to validate this change.
I remembered there was a flip[2] provided similar test data[3], this maybe some 
help.

[1] [https://github.com/ververica/flink-sql-benchmark/commits/master/]

[2] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch
[3] 
https://docs.google.com/document/d/1FW9pqyhyswTVGTJN0R3U9pq4eWzPBEkKOTiM1C_3968/edit?tab=t.0

 

> Optimize "SELECT DISTINCT" into Deduplicate with keep first row
> ---------------------------------------------------------------
>
>                 Key: FLINK-12173
>                 URL: https://issues.apache.org/jira/browse/FLINK-12173
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Planner
>            Reporter: Jark Wu
>            Assignee: Yiyu Tian
>            Priority: Major
>              Labels: pull-request-available
>
> The following distinct query can be optimized into deduplicate on keys "a, b, 
> c, d" and keep the first row.
> {code:sql}
> SELECT DISTINCT a, b, c, d;
> {code}
> We can optimize this query into Deduplicate to get a better performance than 
> GroupAggregate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-12173) Optimize "SELECT DISTINCT" into Deduplicate with keep first row

Reply via email to