Hi, Mary
     Flink has an alignment mechanism for synchronization. All upstream
taks (for example reduce1) will send a message after the end of a round
     to inform all downstream that he has processed all the data. When the
downstream (reduce2) collected all the messages from all his upstream
tasks,
     it(reduce2) knew that all the data was collected. After that,
it(reduce2) could process all its inputs.
     Hope it helps you.
Best,
Guowei


On Mon, Apr 19, 2021 at 5:17 PM Maria Xekalaki <
maria.xekal...@manchester.ac.uk> wrote:

> Hi All,
>
> This is more of a general question. How are tasks synchronized in batch
> execution? If, for example, we ran an iterative pipeline (map1 -> reduce1
> -> reduce2 -> map2), and the first two operators (map1->reduce1) were
> chained, how would reduce2 be notified that map1 -> reduce1 have completed
> their execution so as to start reading its input data? I noticed that in
> the driver classes (MapDriver, ChainedReduceDriver etc.) there are input
> and output counters (numRecordsOut, numRecordsIn). Are these used to check
> if an operator has consumed all of its data?
>
> Thank you in advance.
>
> Best Wishes,
> Mary
>

Reply via email to