Hi!

Flink is different than Spark in that respect. The driver program in Flink
can submit a program to the master (in YARN Application Master) and
disconnect then. It is not a part of the distributed execution - that is
coordinated only by the master (JobManager).
The driver can stay connected to receive progress updates, though.

For programs that do consist of multiple parallel executions (that have
count() or collect() statements), the driver needs to stay connected,
because it needs to pull the intermediate results. However, they are all
pulled/proxied through the master (JobManager), so the driver needs not be
able to connect to the workers. The only requirement for firewalled
clusters is to have two ports from the master node reachable by the client.

Greetings,
Stephan


On Mon, Jan 11, 2016 at 5:18 PM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:

> I am going through the documentation of integrating Flink with YARN.
>
> However not sure whether Flink can be run on YARN in two modes (like
> Spark). In one mode the driver/client program of Flink is also managed by
> YARN. In the second mode where the client program is outside the control of
> YARN. Is the running Flinkon behind Firewalls is like the second mode
>
> Any clarification on this ?
>
> Regards,
> Sourav
>

Reply via email to