Jacob Jona Fahlenkamp created FLINK-36553: ---------------------------------------------
Summary: Local Global aggregation in Datasream API Key: FLINK-36553 URL: https://issues.apache.org/jira/browse/FLINK-36553 Project: Flink Issue Type: New Feature Components: API / DataStream Reporter: Jacob Jona Fahlenkamp This would be useful for the same reasons as in Table API. AggregateFunctions already have a merge method, which could be used to merge the local aggregates. Batch jobs where you read a large amount of raw data -> key by -> aggregate currently write all the raw data to disk. This makes it infeasible to run a batch job over a large span of data, because of large disk requirements. If we had local global-aggregation the data could be preaggregated already, before being written to disk. For stream jobs it would help with network throughput and data skew. -- This message was sent by Atlassian Jira (v8.20.10#820010)