Hi Fabian, Yea, we can immediately start to work on non-windowed aggregates. But it seems that Calcite’s StreamSQL doesn’t support non-windowed aggregates (also not included in roadmap). So we may need to propose this function back to Calcite community?
- Jark Wu > 在 2016年6月17日,下午5:41,Fabian Hueske <fhue...@gmail.com> 写道: > > Hi Jark Wu, > > I agree about the non-windowed aggregates. If there are actual use cases > for this operator, we should definitely support it. > Since it does not depend on windows or time, we can immediately start to > work on it. In principle, it should be rather easy to implement. > However, we have to check how well it integrates with the current state of > Calcite. > > I think forking off a feature branch is a good idea. We have done that > before (e.g., for porting the Table API on top of Calcite), but it is not > so common in the Flink community. > So I would first send a note to the dev list and check that nobody objects. > > I think we can decouple the development of the Table API and SQL. Although > it is desirable to have the same feature set in both APIs, I would not be > strict about it. > However, the Table API does also depend on Calcite because all Table API > queries go through Calcite's logical plan representation and optimizer. By > decoupling the SQL and Table API feature development, we do not need to > wait for the SQL parser but still might still need certain features in the > logical plan or optimizer. I hope we can solve a lot with custom RelNodes > and optimizer rules which should eventually be contributed back to Calcite. > > Best, Fabian > > > 2016-06-17 9:48 GMT+02:00 Jark Wu <wuchong...@alibaba-inc.com>: > >> Hi Fabian, >> >> There are a lot of our business are using non-windowed aggregations. And >> there is a little difference between non-windowed aggregate and Row window >> operator, as the later is bound to a certain window and emit the result of >> the N rows preceding for every incoming row. However the former emit the >> aggregate result of the whole elements. So I suggest to add them for more >> complete semantic. >> >> Regarding the windowed aggregate task, I’m agree with that and I'm looking >> forward as soon as possible to see the corresponding JIRA issues created. >> After that, we can start working on an independent branch without waiting >> for 1.1 released. But I’m still a little concerned about Calcite’s support, >> as we must waiting for Calcite supporting correspond syntax and the >> version released. If we can separate the task into Table API and SQL , we >> may not be blocked by Calcite too much. >> >> What do you think? >> >> - Jark Wu >> >>> 在 2016年6月16日,下午8:37,Fabian Hueske <fhue...@gmail.com> 写道: >>> >>> Hi Jark, >>> >>> thanks for sharing Blink's Streaming Table API. It seems to be close to >> the >>> DataStream API, while the Table API draft I shared is more similar to >>> Calcite's proposal. >>> You are right, the current draft does not include running (non-windowed) >>> aggregates. We were not sure how useful these are since these aggregates >>> are unbound and might become meaningless after being applied on a very >> long >>> stream. However, we can certainly add them, if users request them. >>> An alternative to running aggregates could be what I called "Row window >>> operators" in the streaming Table API draft. These operators emit an >>> aggregate for each incoming row, however the aggregate is bound to a >>> certain window around the row like the 10 rows preceding the row for >> which >>> the aggregate is computed. Calcite calls these windows "Sliding windows" >>> (Attention: This is different from Flink's terminology, in Flink sliding >>> windows are something different). Row windows are similar to running >>> aggregates in that they emit a row for each incoming row. You can also >>> think of them as a (Flink) sliding count window which is evaluate for >> each >>> incoming record. >>> >>> Further differences are the support of Scalar UDFs in the Table API and >> the >>> support for joins which have not been drafted for the Table API yet. >>> Scalar UDFs are definitely also on our roadmap and with upcoming support >>> for side inputs, the DataStream API will also support more types of >> joins. >>> >>> Regarding the current state of Stream SQL in Calcite I am not up to date. >>> >>> I would propose to start with the effort of adding support for windowed >>> aggregates as follows: >>> >>> 1) Add support to define a timestamp / watermark extractor to tables. >> This >>> includes to define a "quasi-monotone" column in a Table's schema. Calcite >>> will use this information to reason about the validity of a query (making >>> sure that grouping includes at least one monotone attribute). >>> 2) Add support for sorting a stream on the timestamp attribute. While >>> sorting itself is not very exciting, it is an easy operation and can be >>> immediately implemented without worrying about API questions. This will >>> also show how well Calcite supports the reasoning about monotone >> attributes. >>> 3) Add support for tumbling windows. >>> >>> In each of these steps we might need to get involved with the Calcite >>> community, depending on Calcite's current support for "quasi-monotone" >>> attributes, etc. >>> >>> What do you think? >>> >>> Best, Fabian >>> >>> >>> 2016-06-14 11:03 GMT+02:00 Jark Wu <wuchong...@alibaba-inc.com>: >>> >>>> Hi Fabian, >>>> >>>> It’s great to hear that we are going to start it! >>>> >>>> I’m glad to share our current Streaming Table API [1]. I find that that >>>> all aggregation functions are scoped to the defined window in Flink >> Stream >>>> Table API design [2] and Calcite StreamSQL desgin [3]. I’m thinking >> that do >>>> we need global aggregation? The global aggregation means that >> aggregation >>>> is applied only on grouped key not including window which is supported >> in >>>> DataStream `datastream.keyBy(f1).sum(f2)`. >>>> >>>> Since the window syntax of StreamSQL is not implemented yet, so will we >>>> help Calcite community with that first or work code for window+agg Table >>>> API first ? >>>> >>>> >>>> [1] >>>> >> https://docs.google.com/document/d/1KMUzvBAWSyQ39T8MyxUi0zNHyvLUnyGMPA7_RLSDpFw/edit?usp=sharing >>>> < >>>> >> https://docs.google.com/document/d/1KMUzvBAWSyQ39T8MyxUi0zNHyvLUnyGMPA7_RLSDpFw/edit?usp=sharing >>>>> >>>> [2] >>>> >> https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o3AyCh2ePqr3V5E/edit# >>>> < >>>> >> https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o3AyCh2ePqr3V5E/edit# >>>>> >>>> [3] https://calcite.apache.org/docs/stream.html#tumbling-windows < >>>> https://calcite.apache.org/docs/stream.html#tumbling-windows> >>>> >>>> >>>> - Jark Wu >>>> >>>>> 在 2016年6月14日,上午1:10,Fabian Hueske <fhue...@gmail.com> 写道: >>>>> >>>>> Hi Jark, >>>>> >>>>> wow, that's good news! >>>>> You are right, the streaming Table API is currently very limited. In >> the >>>>> last month's we changed the internal architecture and put everything on >>>> top >>>>> of Apache Calcite. >>>>> For the upcoming 1.1 release, we won't add new features to the Table >> API >>>> / >>>>> SQL. However for the 1.2 release, it we plan to focus on the streaming >>>>> Table API and Stream SQL to add support for windowed aggregates and >>>> joins, >>>>> which corresponds to Task 7 and 9 in the design document. You are >>>>> completely right, that we should start to add tickets to JIRA for >> this. I >>>>> will do that tomorrow. >>>>> >>>>> It is great that you have already working code for windowed aggregates >>>> and >>>>> joins! Here is a link to our current API draft for windows in the Table >>>> API >>>>> [1]. Would be great if you could share how your API looks like. As you >>>>> said, the internals have changed a lot by now, but we might want to >> reuse >>>>> your API for Table API windows and maybe the code of the runtime. >>>> However, >>>>> we need to go through Calcite for optimization and SQL support, so some >>>>> parts need to be definitely changed. Stream SQL is also on the roadmap >> of >>>>> the Calcite community, but it might be that some features that we will >>>> need >>>>> are not completed yet. So, maybe we help the Calcite community with >> that >>>> as >>>>> well. >>>>> >>>>> If you want to contribute, you should first read the how to contribute >>>>> guide [2] and guide for code contributions [3]. >>>>> The general rule is to first open a JIRA and later a pull request. >>>> Changes >>>>> that are extensive or modify current behavior (except bugs) should be >>>>> discussed before starting to work on them. >>>>> >>>>> Looking forward to work with you on Flink, >>>>> Fabian >>>>> >>>>> [1] >>>>> >>>> >> https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o3AyCh2ePqr3V5E/edit#heading=h.3iw7frfjdcb2 >>>>> [2] http://flink.apache.org/how-to-contribute.html >>>>> [3] http://flink.apache.org/contribute-code.html >>>>> >>>>> >>>>> 2016-06-13 11:31 GMT+02:00 Jark Wu <wuchong...@alibaba-inc.com>: >>>>> >>>>>> Hi, >>>>>> >>>>>> We have a great interest in the new Table API & SQL. In Alibaba, we >> have >>>>>> added a lot of features (groupBy, agg, window, join, UDF …) to >> Streaming >>>>>> Table API (base on Flink 1.0). Now, many jobs run on Table API in >>>>>> production environment. But we want to keep pace with the community, >>>> and we >>>>>> have noticed that Flink Community reworked the Table API and also >>>> supported >>>>>> SQL. That is really cool. However, the Streaming Table API is still so >>>>>> weak. So we want to contribute to accelerate the Streaming Table API >> and >>>>>> StreamSQL growth. >>>>>> >>>>>> It seems that we have complete Task-5 and Task-6 mentioned in the Work >>>>>> Plan < >>>>>> >>>> >> https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI/edit# >>>>> . >>>>>> So can we start Task-7 and Task-9 now? Is there any more specific >>>> plans? I >>>>>> think it’s better to create an umbrella JIRA like FLINK-3221 to make >> the >>>>>> develop plan clearer. >>>>>> >>>>>> If I want to contribute code for groupBy and agg function, what >> should I >>>>>> do? As I didn’t find related JIRAs, can I create JIRA and pull a >> request >>>>>> directly? >>>>>> >>>>>> Sorry for so many questions at a time. >>>>>> >>>>>> >>>>>> >>>>>> - Jark Wu (wuchong) >>>>>> >>>>>> >>>> >>>> >> >>