Jacob Jona Fahlenkamp created FLINK-36553:
---------------------------------------------

             Summary: Local Global aggregation in Datasream API
                 Key: FLINK-36553
                 URL: https://issues.apache.org/jira/browse/FLINK-36553
             Project: Flink
          Issue Type: New Feature
          Components: API / DataStream
            Reporter: Jacob Jona Fahlenkamp


This would be useful for the same reasons as in Table API. AggregateFunctions 
already have a merge method, which could be used to merge the local aggregates.

Batch jobs where you read a large amount of raw data -> key by -> aggregate 
currently write all the raw data to disk. This makes it infeasible to run a 
batch job over a large span of data, because of large disk requirements. If we 
had local global-aggregation the data could be preaggregated already, before 
being written to disk.

For stream jobs it would help with network throughput and data skew.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to