I found the solution here: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Regarding-Broadcast-of-datasets-in-streaming-context-td6456.html
2016-11-14 9:41 GMT+01:00 Ufuk Celebi <u...@apache.org>: > I think this is independent of streaming. If you want to compute the > aggregate over all keys and data you need to do this in a single task, e.g. > use a (flat)map with parallelism 1, do the aggregation there and then > broadcast to downstream operators. Does this make sense or am I overlooking > something? > > On 12 November 2016 at 12:18:04, Felix Neutatz (neut...@googlemail.com) > wrote: > > > want to calculate a local aggregation for each task and then > > combine all these local aggregates to one global aggregate and > > push this global aggregate to all nodes and continue processing > > the data stream. If you don't understand my description, I also > > made some drawings of what I mean: https://docs.google.com/ > presentation/d/13ei6pzhwNKqNShhdNWXqJaYCG1z0Hsrxfy5sRnqun5M/edit?usp= > sharing > > > >