Hi Gordon, Thanks for the explanation. It is much clear now. Looks like a much cleaner approach. In that way the driver program can run in a machine which does not need connectivity to all worker nodes.
Regards, Sourav On Mon, Jan 11, 2016 at 9:22 AM, Tzu-Li (Gordon) Tai <tzuli...@gmail.com> wrote: > Hi Sourav, > > A little help with more clarification on your last comment. > > In sense of "where" the driver program is executed, then yes the Flink > driver program runs in a mode similar to Spark's YARN-client. > > However, the "role" of the driver program and the work that it is > responsible of is quite different between Flink and Spark. In Spark, the > driver program is in charge of coordinating Spark workers (executors) and > must listen for and accept incoming connections from the workers throughout > the job's lifetime. Therefore, in Spark's YARN-client mode, you must keep > the driver program process alive otherwise the job will be shutdown. > > However, in Flink, the coordination of Flink TaskManagers to complete a job > is handled by Flink's JobManager once the client at the driver program > submits the job to the JobManager. The driver program is solely used for > the > job submission and can disconnect afterwards. > > Like what Stephan explained, if the user-defined dataflow defines any > intermediate results to be retrieved via collect() or print(), the results > are transmitted through the JobManager. Only then does the driver program > need to stay connected. Note that this connection still does not need to > have any connections with the workers (Flink TaskManagers), only the > JobManager. > > Cheers, > Gordon > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-with-Yarn-tp4224p4227.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >