Hi Gordon,

Thanks for the explanation. It is much clear now. Looks like a much cleaner
approach. In that way the driver program can run in a machine which does
not need connectivity to all worker nodes.

Regards,
Sourav

On Mon, Jan 11, 2016 at 9:22 AM, Tzu-Li (Gordon) Tai <tzuli...@gmail.com>
wrote:

> Hi Sourav,
>
> A little help with more clarification on your last comment.
>
> In sense of "where" the driver program is executed, then yes the Flink
> driver program runs in a mode similar to Spark's YARN-client.
>
> However, the "role" of the driver program and the work that it is
> responsible of is quite different between Flink and Spark. In Spark, the
> driver program is in charge of coordinating Spark workers (executors) and
> must listen for and accept incoming connections from the workers throughout
> the job's lifetime. Therefore, in Spark's YARN-client mode, you must keep
> the driver program process alive otherwise the job will be shutdown.
>
> However, in Flink, the coordination of Flink TaskManagers to complete a job
> is handled by Flink's JobManager once the client at the driver program
> submits the job to the JobManager. The driver program is solely used for
> the
> job submission and can disconnect afterwards.
>
> Like what Stephan explained, if the user-defined dataflow defines any
> intermediate results to be retrieved via collect() or print(), the results
> are transmitted through the JobManager. Only then does the driver program
> need to stay connected. Note that this connection still does not need to
> have any connections with the workers (Flink TaskManagers), only the
> JobManager.
>
> Cheers,
> Gordon
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-with-Yarn-tp4224p4227.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

Reply via email to