Hi, Should we discuss also about the design of distinct for the stream case? It might go well in the context of tables as well as in the context of aggregates over windows...
Dr. Radu Tudoran Senior Research Engineer - Big Data Expert IT R&D Division HUAWEI TECHNOLOGIES Duesseldorf GmbH European Research Center Riesstrasse 25, 80992 München E-mail: radu.tudo...@huawei.com Mobile: +49 15209084330 Telephone: +49 891588344173 HUAWEI TECHNOLOGIES Duesseldorf GmbH Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063, Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063, Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -----Original Message----- From: Fabian Hueske (JIRA) [mailto:j...@apache.org] Sent: Monday, February 06, 2017 2:56 PM To: dev@flink.apache.org Subject: [jira] [Created] (FLINK-5722) Implement DISTINCT as dedicated operator Fabian Hueske created FLINK-5722: ------------------------------------ Summary: Implement DISTINCT as dedicated operator Key: FLINK-5722 URL: https://issues.apache.org/jira/browse/FLINK-5722 Project: Flink Issue Type: Improvement Components: Table API & SQL Affects Versions: 1.2.0, 1.3.0 Reporter: Fabian Hueske DISTINCT is currently implemented for batch Table API / SQL as an aggregate which groups on all fields. Grouped aggregates are implemented as GroupReduce with sort-based combiner. This operator can be more efficiently implemented by using ReduceFunction and hinting a HashCombine strategy. The same ReduceFunction can be used for all DISTINCT operations and can be assigned with appropriate forward field annotations. We would need a custom conversion rule which translates distinct aggregations (grouping on all fields and returning all fields) into a custom DataSetRelNode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)