Hi Rong, Thanks for taking the initiative to improve the support for DISTINCT aggregations! I've made a pass over your design document and left a couple of comments. I think it is a really good write up and serves as a good start.
IMO, the next steps could be to 1) continue and finalize the discussion on the design doc. Feel free to open a new umbrella JIRA and link your doc there. 2) check which JIRAs are still relevant. Close or reorganize them according to the plan in your design doc and make them subissues of the umbrella issue. 3) add support for DISTINCT in SQL 4) later add extend the Table API to also support distinct aggregations (this would be mostly API changes since the execution is solved before) Let me know what you think. Best, Fabian 2018-02-14 3:07 GMT+01:00 Rong Rong <walter...@gmail.com>: > Hi Community, > > We are working on support of distinct aggregators over data stream on > Table/SQL API. Currently there are seems to be many JIRAs related to > distinct agg over stream use cases which are still pending (FLINK-6249 > <https://issues.apache.org/jira/browse/FLINK-6249>, FLINK-6260 > <https://issues.apache.org/jira/browse/FLINK-6260>, FLINK-5315 > <https://issues.apache.org/jira/browse/FLINK-5315>, FLINK-6335 > <https://issues.apache.org/jira/browse/FLINK-6335>, FLINK-6373 > <https://issues.apache.org/jira/browse/FLINK-6373>, FLINK-6250 > <https://issues.apache.org/jira/browse/FLINK-6250>, etc) and I am having > some concerns when trying to come up with a solution as there might be > other use cases out there. > > I summarized a write up and categorized the use cases into unbounded or > bounded aggregations and proposed a solution through modifying and adding > new distinct aggregate functions using UDAGG API with DataView. Please find > it here > <https://docs.google.com/document/d/1zj6OA-K2hi7ah8Fo- > xTQB-mVmYfm6LsN2_NHgTCVmJI/edit?usp=sharing> > . > > Any comments or suggestions are highly appreciated. > > Many Thanks, > Rong >