Re: Running on a firewalled Yarn cluster?

Niels Basjes Mon, 02 Nov 2015 13:49:54 -0800

My take on those 3 options:
a) Bad idea; people need to be able to automate their jobs and run them
from the command line (i.e. bash, cron).
b) Bad idea; Same reason you gave. In addition I do not want to reserve an
open 'flink port' for every user who wants to run a job.
c) From my perspective this sounds like the most viable solution.


I don't know how they implemented this in MR.
I know the way they did it actually works on our clusters (with firewalls).

Niels Basjes

On Mon, Nov 2, 2015 at 4:34 PM, Robert Metzger <rmetz...@apache.org> wrote:

> Hi Niels,
>
> so the problem is that you can not submit a job to Flink using the
> "/bin/flink" tool, right?
> I assume Flink and its TaskManagers properly start and connect to each
> other (the number of TaskManagers is shown correctly in the web interface).
>
> I see the following solutions for the problem
> a) Add a new page in the job manager web frontend allowing users to upload
> and execute a jar with a flink job
> b) add options for starting the jobmanager and blob manager on the job
> manager container on fixed ports
> c) Somehow make the akka rpc requests and blob manager uploads over HTTP
> using the YARN proxy
>
> The reason why we use a free port instead a fixed port is that this way
> two job manager containers can run on the same machine. So solution b)
> would only work if users are not using multiple flink jobs / sessions on
> yarn at the same time (or you make somehow sure they are not running on the
> same machine).
>
> What's your take on the three solutions?
>
> Does anybody here know how MR is doing it? Are they running the
> ApplicationMaster RPC on a fixed port? Do they use HTTP-based calls over
> the proxy?
>
> Robert
>
>
>
> On Mon, Nov 2, 2015 at 4:05 PM, Niels Basjes <ni...@basjes.nl> wrote:
>
>> Hi,
>>
>> Here at work our security guys have chosen (long time ago) to only allow
>> the firewalls to have the ports open that needed (I say: good call!).
>> For the Yarn cluster this includes things like the proxy to see the
>> application manager of an application.
>> For everything we've done so far (i.e. mr/pig/...) this has worked fine.
>>
>> Now with Flink I run into problems:
>> When I run either the yarn-session or a job on Yarn the application
>> manager gets started and I can see the webinterface.
>> The problem is that the jobmanager.rpc.address is on one of the worker
>> nodes and the jobmanager.rpc.port is essentially a random value.
>> A random value which is not accessible because of the firewall rules.
>> So I cannot reach the jobmanager on the yarn cluster.
>>
>> How do I tackle this assuming that opening the all ports on the firewall
>> is not an option?
>>
>> Or is this something that should be handled by Flink? ( Perhaps the
>> application manager can proxy the RPC calls? )
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Running on a firewalled Yarn cluster?

Reply via email to