Re: Map and Reduce cycle

2015-07-17 Thread Fabian Hueske
Hi Bill, Flink uses pipelined data shipping by default. If you have a program like: source -> map -> reduce -> sink, the mapper will immediately start to shuffle data over the network to the reducer. The reducer collects that data and starts to sort it in batches. When the mapper is done and the r

Map and Reduce cycle

2015-07-17 Thread Bill Sparks
Does flink require all the map tasks to finish before the reducers can proceed like Spark, or can the reducer operations start before all the mappers have finished like the older Hadoop mapreduce. Also my understanding is that flink manages it's own heap, do you/we have a sense of the performan