Hello,

I list some questions gathered while reading documentation on Flink's internals 
and I am grateful to receive your answers.

1) How is the JobManager involved in the communication between tasks running in 
task slots on TaskManagers?

>From [1] it appears to me that, as part of the control flow for data exchange, 
>every time a task has a a result that is consumable, it notifies the JobManager
This is repeated in [2] "When producing an intermediate result, the producing 
task is responsible to notify the JobManager about available data".
However, the result's size might as well be 1, in the case of data streams, so 
I ask this question to understand better if for each data exchange the 
JobManager is involved.

2) In case my understanding of the aforementioned control flow is correct, then 
from the following options, which is the data outcome that triggers a 
notification of the JobManager: a ResultPartition, a ResultSubpartition or a 
Buffer?

3) Idem, then are the Akka actors involved in this notification of data 
availability for consumption?

4) How is orchestrated this co-existence of receiver-initiated data transfers 
(the consumer requests the data partitions) with the push data transfers [2] 
between 2 tasks ?
>From [3] I retain the paragraph:

"The JobManager also acts as the input split assigner for data sources. It is 
responsible for distributing the work across all TaskManager such that data 
locality is preserved where possible. In order to dynamically balance the load, 
the Tasks request a new input split after they have finished processing the old 
one. This request is realized by sending a RequestNextInputSplit to the 
JobManager. The JobManager responds with a NextInputSplit message. If there are 
no more input splits, then the input split contained in the message is null.

The Tasks are deployed lazily to the TaskManagers. This means that tasks which 
consume data are only deployed after one of its producers has finished 
producing some data. Once the producer has done so, it sends a 
ScheduleOrUpdateConsumers message to the JobManager. This messages says that 
the consumer can now read the newly produced data. If the consuming task is not 
yet running, it will be deployed to a TaskManager."

5) Can you please summarize how this data exchange occurs in Flink, based on an 
example?

Thank you!

Best regards,
Camelia


[1] 
https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
[2] 
https://cwiki.apache.org/confluence/display/FLINK/Network+Stack%2C+Plugable+Data+Exchange
[3] https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors

Reply via email to