Hi! Flink is different than Spark in that respect. The driver program in Flink can submit a program to the master (in YARN Application Master) and disconnect then. It is not a part of the distributed execution - that is coordinated only by the master (JobManager). The driver can stay connected to receive progress updates, though.
For programs that do consist of multiple parallel executions (that have count() or collect() statements), the driver needs to stay connected, because it needs to pull the intermediate results. However, they are all pulled/proxied through the master (JobManager), so the driver needs not be able to connect to the workers. The only requirement for firewalled clusters is to have two ports from the master node reachable by the client. Greetings, Stephan On Mon, Jan 11, 2016 at 5:18 PM, Sourav Mazumder < sourav.mazumde...@gmail.com> wrote: > I am going through the documentation of integrating Flink with YARN. > > However not sure whether Flink can be run on YARN in two modes (like > Spark). In one mode the driver/client program of Flink is also managed by > YARN. In the second mode where the client program is outside the control of > YARN. Is the running Flinkon behind Firewalls is like the second mode > > Any clarification on this ? > > Regards, > Sourav >