Jacob Jona Fahlenkamp created FLINK-36553:
---------------------------------------------
Summary: Local Global aggregation in Datasream API
Key: FLINK-36553
URL: https://issues.apache.org/jira/browse/FLINK-36553
Project: Flink
Issue Type: New Feature
Components: API / DataStream
Reporter: Jacob Jona Fahlenkamp
This would be useful for the same reasons as in Table API. AggregateFunctions
already have a merge method, which could be used to merge the local aggregates.
Batch jobs where you read a large amount of raw data -> key by -> aggregate
currently write all the raw data to disk. This makes it infeasible to run a
batch job over a large span of data, because of large disk requirements. If we
had local global-aggregation the data could be preaggregated already, before
being written to disk.
For stream jobs it would help with network throughput and data skew.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)