I think this is independent of streaming. If you want to compute the aggregate 
over all keys and data you need to do this in a single task, e.g. use a 
(flat)map with parallelism 1, do the aggregation there and then broadcast to 
downstream operators. Does this make sense or am I overlooking something?

On 12 November 2016 at 12:18:04, Felix Neutatz (neut...@googlemail.com) wrote:
> > want to calculate a local aggregation for each task and then 
> combine all these local aggregates to one global aggregate and 
> push this global aggregate to all nodes and continue processing 
> the data stream. If you don't understand my description, I also 
> made some drawings of what I mean: 
> https://docs.google.com/presentation/d/13ei6pzhwNKqNShhdNWXqJaYCG1z0Hsrxfy5sRnqun5M/edit?usp=sharing
>  
> 

Reply via email to