[ https://issues.apache.org/jira/browse/FLINK-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653975#comment-15653975 ]
Jark Wu commented on FLINK-5047: -------------------------------- Make sense. I prefer the third approach too. > Add sliding group-windows for batch tables > ------------------------------------------ > > Key: FLINK-5047 > URL: https://issues.apache.org/jira/browse/FLINK-5047 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL > Reporter: Jark Wu > > Add Slide group-windows for batch tables as described in > [FLIP-11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations]. > There are two ways to implement sliding windows for batch: > 1. replicate the output in order to assign keys for overlapping windows. This > is probably the more straight-forward implementation and supports any > aggregation function but blows up the data volume. > 2. if the aggregation functions are combinable / pre-aggregatable, we can > also find the largest tumbling window size from which the sliding windows can > be assembled. This is basically the technique used to express sliding windows > with plain SQL (GROUP BY + OVER clauses). For a sliding window Slide(10 > minutes, 2 minutes) this would mean to first compute aggregates of > non-overlapping (tumbling) 2 minute windows and assembling consecutively 5 of > these into a sliding window (could be done in a MapPartition with sorted > input). The implementation could be done as an optimizer rule to split the > sliding aggregate into a tumbling aggregate and a SQL WINDOW operator. Maybe > it makes sense to implement the WINDOW clause first and reuse this for > sliding windows. > 3. There is also a third, hybrid solution: Doing the pre-aggregation on the > largest non-overlapping windows (as in 2) and replicating these results and > processing those as in the 1) approach. The benefits of this is that it a) is > based on the implementation that supports non-combinable aggregates (which is > required in any case) and b) that it does not require the implementation of > the SQL WINDOW operator. Internally, this can be implemented again as an > optimizer rule that translates the SlidingWindow into a pre-aggregating > TublingWindow and a final SlidingWindow (with replication). > see FLINK-4692 for more discussion -- This message was sent by Atlassian JIRA (v6.3.4#6332)