[ https://issues.apache.org/jira/browse/FLINK-12173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906590#comment-17906590 ]
lincoln lee commented on FLINK-12173: ------------------------------------- [~jhughes] For performance testing, the harness test can be a complement (similar to the benchmark for operator itself), but there is also a need for integrated performance tests, like the tpc test[1] for batch scenarios and the nexmark test for streaming scenarios. Current cases in Nexmark don't hit this scenario, so there's a need to extend the benchmark query (or/and extend the test data) to validate this change. I remembered there was a flip[2] provided similar test data[3], this maybe some help. [1] [https://github.com/ververica/flink-sql-benchmark/commits/master/] [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch [3] https://docs.google.com/document/d/1FW9pqyhyswTVGTJN0R3U9pq4eWzPBEkKOTiM1C_3968/edit?tab=t.0 > Optimize "SELECT DISTINCT" into Deduplicate with keep first row > --------------------------------------------------------------- > > Key: FLINK-12173 > URL: https://issues.apache.org/jira/browse/FLINK-12173 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner > Reporter: Jark Wu > Assignee: Yiyu Tian > Priority: Major > Labels: pull-request-available > > The following distinct query can be optimized into deduplicate on keys "a, b, > c, d" and keep the first row. > {code:sql} > SELECT DISTINCT a, b, c, d; > {code} > We can optimize this query into Deduplicate to get a better performance than > GroupAggregate. -- This message was sent by Atlassian Jira (v8.20.10#820010)