[ https://issues.apache.org/jira/browse/FLINK-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653869#comment-15653869 ]
Jark Wu commented on FLINK-5047: -------------------------------- Hi [~fhueske] Agree. I think the first approach can be easily supported after FLINK-4692 resolved. Regarding to {quote} it a) is based on the implementation that supports non-combinable aggregates (which is required in any case) {quote} If I understand correctly, the third approach doesn't support non-combinable aggregates such as median, right ? It's only an optimization for pre-aggregation which is better than approach-2 , right? > Add sliding group-windows for batch tables > ------------------------------------------ > > Key: FLINK-5047 > URL: https://issues.apache.org/jira/browse/FLINK-5047 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL > Reporter: Jark Wu > > Add Slide group-windows for batch tables as described in > [FLIP-11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations]. > There are two ways to implement sliding windows for batch: > 1. replicate the output in order to assign keys for overlapping windows. This > is probably the more straight-forward implementation and supports any > aggregation function but blows up the data volume. > 2. if the aggregation functions are combinable / pre-aggregatable, we can > also find the largest tumbling window size from which the sliding windows can > be assembled. This is basically the technique used to express sliding windows > with plain SQL (GROUP BY + OVER clauses). For a sliding window Slide(10 > minutes, 2 minutes) this would mean to first compute aggregates of > non-overlapping (tumbling) 2 minute windows and assembling consecutively 5 of > these into a sliding window (could be done in a MapPartition with sorted > input). The implementation could be done as an optimizer rule to split the > sliding aggregate into a tumbling aggregate and a SQL WINDOW operator. Maybe > it makes sense to implement the WINDOW clause first and reuse this for > sliding windows. > 3. There is also a third, hybrid solution: Doing the pre-aggregation on the > largest non-overlapping windows (as in 2) and replicating these results and > processing those as in the 1) approach. The benefits of this is that it a) is > based on the implementation that supports non-combinable aggregates (which is > required in any case) and b) that it does not require the implementation of > the SQL WINDOW operator. Internally, this can be implemented again as an > optimizer rule that translates the SlidingWindow into a pre-aggregating > TublingWindow and a final SlidingWindow (with replication). > see FLINK-4692 for more discussion -- This message was sent by Atlassian JIRA (v6.3.4#6332)