How do network transmissions in Flink work?

Niklas Semmler Mon, 06 Jul 2015 14:59:25 -0700

Hello Flink Community,

I am working on a network scheduler and am currently reading Flink'scode to figure out how the data exchange works. It would be great if youcould help me with some of my issues and questions.

Basically I want to extract from flink the time when a data transmissionbetween two machines starts (1), their connection details (2), how muchdata is involved (3) and when it ends (4).

So far I have understood that the scheduling of tasks is done via thescheduleOrUpdateConsumers JobManagerMessage. In the function of the samename in the class Execution I have been able to extract the IP/Port pairof both the producer and the consumer(s) use.

Furthermore I understand that in the context of a "blocking" datatransmission Flink will first create a ResultPartition and store all thedata in the form of Buffers before starting the transmission. So inprinciple I should be able to figure out what amount of data Flink willcommunicate by looking at the respectiveResultSubpartition.totalNumberOfBytes, right?However, in the process I would need to map each ResultSubpartition to aslot or deployed task, so that I can associate this amount of data withconnection details of the sender and the receiver. Any hints on how todo that?

Now from what I see the same is not possible in a "pipelined" context,correct? Can anything be said about the data to be communicated?

Finally, I was unable to locate in the code and in the logs where aTask's state is changing from RUNNING to FINISHED. Could you give me apointer?


It would be great if you could share your insights on the problems above ;).

Best regards,
Niklas

--
PhD Student / Research Assistant
INET, TU Berlin
Room 4.029
Marchstr 23
10587 Berlin
Tel: +49 30 314 78752

How do network transmissions in Flink work?

Reply via email to