[ https://issues.apache.org/jira/browse/FLINK-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653968#comment-15653968 ]
Fabian Hueske commented on FLINK-5047: -------------------------------------- Right, the third approach does only work for combinable aggregates and reduces the amount of replicated data because only pre-aggregates are replicated. I'd prefer it over approach 2 because it is easier to implement (it extends approach 1) than approach 2 which would require an implementation for SQL Window. > Add sliding group-windows for batch tables > ------------------------------------------ > > Key: FLINK-5047 > URL: https://issues.apache.org/jira/browse/FLINK-5047 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL > Reporter: Jark Wu > > Add Slide group-windows for batch tables as described in > [FLIP-11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations]. > There are two ways to implement sliding windows for batch: > 1. replicate the output in order to assign keys for overlapping windows. This > is probably the more straight-forward implementation and supports any > aggregation function but blows up the data volume. > 2. if the aggregation functions are combinable / pre-aggregatable, we can > also find the largest tumbling window size from which the sliding windows can > be assembled. This is basically the technique used to express sliding windows > with plain SQL (GROUP BY + OVER clauses). For a sliding window Slide(10 > minutes, 2 minutes) this would mean to first compute aggregates of > non-overlapping (tumbling) 2 minute windows and assembling consecutively 5 of > these into a sliding window (could be done in a MapPartition with sorted > input). The implementation could be done as an optimizer rule to split the > sliding aggregate into a tumbling aggregate and a SQL WINDOW operator. Maybe > it makes sense to implement the WINDOW clause first and reuse this for > sliding windows. > 3. There is also a third, hybrid solution: Doing the pre-aggregation on the > largest non-overlapping windows (as in 2) and replicating these results and > processing those as in the 1) approach. The benefits of this is that it a) is > based on the implementation that supports non-combinable aggregates (which is > required in any case) and b) that it does not require the implementation of > the SQL WINDOW operator. Internally, this can be implemented again as an > optimizer rule that translates the SlidingWindow into a pre-aggregating > TublingWindow and a final SlidingWindow (with replication). > see FLINK-4692 for more discussion -- This message was sent by Atlassian JIRA (v6.3.4#6332)