Hi All,

This is more of a general question. How are tasks synchronized in batch 
execution? If, for example, we ran an iterative pipeline (map1 -> reduce1 -> 
reduce2 -> map2), and the first two operators (map1->reduce1) were chained, how 
would reduce2 be notified that map1 -> reduce1 have completed their execution 
so as to start reading its input data? I noticed that in the driver classes 
(MapDriver, ChainedReduceDriver etc.) there are input and output counters 
(numRecordsOut, numRecordsIn). Are these used to check if an operator has 
consumed all of its data?

Thank you in advance.

Best Wishes,
Mary

Reply via email to