Hi, (copying my answer from Stack Overflow)
The current release of Flink (Flink 1.4.0, Dec 2017) does not feature built-in support for pre-aggregations. However, there are efforts on the way to add this for the next release (1.5.0), see FLINK-7561 [4] You can implement a pre-aggregation operation based on a ProcessFunction [1]. The ProcessFunction could keep the pre-aggregates in a HashMap (of fixed size) in memory and register timers event-time and processing-time) to periodically emit the pre-aggregates. The state (i.e., content of the `HashMap`) should be persisted in managed operator state [2] to prevent data loss in case of a failure. When setting the timers, you need to respect the window boundaries. Please note that FoldFunction has been deprecated and should be replaced by AggregateFunction [3]. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html#the-processfunction [2] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/state.html#operator-state [3] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/windows.html#aggregatefunction [4] https://issues.apache.org/jira/browse/FLINK-7561 2017-12-15 8:04 GMT+01:00 zhaifengwei <zhaifengwei2...@163.com>: > I have a cluster environment, I need aggregate dataStream on it. > I`m wonder whether I can aggregate in local server first, then aggregate in > global. > When I aggregate dataStream in global, the Network IO will increase fast. > I just want decrease the Network IO, So I need aggregate in local server > first. > How can I do it. > > DataStream<String> dataIn.... > dataIn.map().filter().assignTimestampsAndWatermarks( > ).keyBy().window().Fold() > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/ >