Re: Running on a firewalled Yarn cluster?

Robert Metzger Wed, 25 Nov 2015 10:01:11 -0800

Hi,
I just wanted to let you know that I didn't forget about this!

The BlobManager in 1.0-SNAPSHOT has already a configuration parameter to
use a certain range of ports.
I'm trying to add the same feature for YARN tomorrow.
Sorry for the delay.



On Tue, Nov 10, 2015 at 9:27 PM, Cory Monty <cory.mo...@getbraintree.com>
wrote:

> Thanks, Stephan.
>
> I'll give those two workarounds a try!
>
> On Tue, Nov 10, 2015 at 2:18 PM, Stephan Ewen <se...@apache.org> wrote:
>
>> Hi Cory!
>>
>> There is no flag to define the BlobServer port right now, but we should
>> definitely add this: https://issues.apache.org/jira/browse/FLINK-2996
>>
>> If your setup is such that the firewall problem is only between client
>> and master node (and the workers can reach the master on all ports), then
>> you can try two workarounds:
>>
>> 1) Start the program in the cluster (or on the master node, via ssh).
>>
>> 2) Add the program jar to the lib directory of Flink, and start your
>> program with the RemoteExecutor, without a jar attachment. Then it only
>> needs to communicate to the actor system (RPC) port, which is not random in
>> standalone mode (6123 by default).
>>
>> Stephan
>>
>>
>>
>>
>> On Tue, Nov 10, 2015 at 8:46 PM, Cory Monty <cory.mo...@getbraintree.com>
>> wrote:
>>
>>> I'm also running into an issue with a non-YARN cluster. When submitting
>>> a JAR to Flink, we'll need to have an arbitrary port open on all of the
>>> hosts, which we don't know about until the socket attempts to bind; a bit
>>> of a problem for us.
>>>
>>> Are there ways to submit a JAR to Flink that bypasses the need for the
>>> BlobServer's random port binding? Or, to control the port BlobServer binds
>>> to?
>>>
>>> Cheers,
>>>
>>> Cory
>>>
>>> On Thu, Nov 5, 2015 at 8:07 AM, Niels Basjes <ni...@basjes.nl> wrote:
>>>
>>>> That is what I tried. Couldn't find that port though.
>>>>
>>>> On Thu, Nov 5, 2015 at 3:06 PM, Robert Metzger <rmetz...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> cool, that's good news.
>>>>>
>>>>> The RM proxy is only for the web interface of the AM.
>>>>>
>>>>>  I'm pretty sure that the MapReduce AM has at least two ports:
>>>>> - one for the web interface (accessible through the RM proxy, so
>>>>> behind the firewall)
>>>>> - one for the AM RPC (and that port is allocated within the configured
>>>>> range, open through the firewall).
>>>>>
>>>>> You can probably find the RPC port in the log file of the running
>>>>> MapReduce AM (to find that, identify the NodeManager running the AM, 
>>>>> access
>>>>> the NM web interface and retrieve the logs of the container running the 
>>>>> AM).
>>>>>
>>>>> Maybe the mapreduce client also logs the AM RPC port when querying the
>>>>> status of a running job.
>>>>>
>>>>>
>>>>> On Thu, Nov 5, 2015 at 2:59 PM, Niels Basjes <ni...@basjes.nl> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I checked and this setting has been set to a limited port range of
>>>>>> only 100 port numbers.
>>>>>>
>>>>>> I tried to find the actual port an AM is running on and couldn't find
>>>>>> it (I'm not the admin on that cluster)
>>>>>>
>>>>>> The url to the AM that I use to access it always looks like this:
>>>>>>
>>>>>> http://master-001.xxxxxx.net:8088/proxy/application_1443166961758_85492/index.html
>>>>>>
>>>>>> As you can see I never connect directly; always via the proxy that
>>>>>> runs over the master on a single fixed port.
>>>>>>
>>>>>> Niels
>>>>>>
>>>>>> On Thu, Nov 5, 2015 at 2:46 PM, Robert Metzger <rmetz...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> While discussing with my colleagues about the issue today, we came
>>>>>>> up with another approach to resolve the issue:
>>>>>>>
>>>>>>> d) Upload the job jar to HDFS (or another FS) and trigger the
>>>>>>> execution of the jar using an HTTP request to the web interface.
>>>>>>>
>>>>>>> We could add some tooling into the /bin/flink client to submit a job
>>>>>>> like this transparently, so users would not need to bother with the file
>>>>>>> upload and request sending.
>>>>>>> Also, Sachin started a discussion on the dev@ list to add support
>>>>>>> for submitting jobs over the web interface, so maybe we can base the fix
>>>>>>> for FLINK-2960 on that.
>>>>>>>
>>>>>>> I've also looked into the Hadoop MapReduce code and it seems they do
>>>>>>> the following:
>>>>>>> When submitting a job, they are uploading the job jar file to HDFS.
>>>>>>> They also upload a configuration file that contains all the config 
>>>>>>> options
>>>>>>> of the job. Then, they submit this altogether as an application to YARN.
>>>>>>> So far, there has not been any firewall involved. They establish a
>>>>>>> connection between the JobClient and the ApplicationMaster when the 
>>>>>>> user is
>>>>>>> querying the current job status, but I could not find any special code
>>>>>>> getting the status over HTTP.
>>>>>>>
>>>>>>> But I found the following configuration parameter:
>>>>>>> "yarn.app.mapreduce.am.job.client.port-range", so it seems that they 
>>>>>>> try to
>>>>>>> allocate the AM port within that range (if specified).
>>>>>>> Niels, can you check if this configuration parameter is set in your
>>>>>>> environment? I assume your firewall allows outside connections from that
>>>>>>> port range.
>>>>>>> So we also have a new approach:
>>>>>>>
>>>>>>> f) Allocate the YARN application master (and blob manager) within a
>>>>>>> user-specified port-range.
>>>>>>>
>>>>>>> This would be really easy to implement, because we would just need
>>>>>>> to go through the range until we find an available port.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 3, 2015 at 1:06 PM, Niels Basjes <ni...@basjes.nl>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Great!
>>>>>>>>
>>>>>>>> I'll watch the issue and give it a test once I see a working patch.
>>>>>>>>
>>>>>>>> Niels Basjes
>>>>>>>>
>>>>>>>> On Tue, Nov 3, 2015 at 1:03 PM, Maximilian Michels <m...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Niels,
>>>>>>>>>
>>>>>>>>> Thanks a lot for reporting this issue. I think it is a very common
>>>>>>>>> setup in corporate infrastructure to have restrictive firewall 
>>>>>>>>> settings.
>>>>>>>>> For Flink 1.0 (and probably in a minor 0.10.X release) we will have to
>>>>>>>>> address this issue to ensure proper integration of Flink.
>>>>>>>>>
>>>>>>>>> I've created a JIRA to keep track:
>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-2960
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Max
>>>>>>>>>
>>>>>>>>> On Tue, Nov 3, 2015 at 11:02 AM, Niels Basjes <ni...@basjes.nl>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I forgot to answer your other question:
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 2, 2015 at 4:34 PM, Robert Metzger <
>>>>>>>>>> rmetz...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> so the problem is that you can not submit a job to Flink using
>>>>>>>>>>> the "/bin/flink" tool, right?
>>>>>>>>>>> I assume Flink and its TaskManagers properly start and connect
>>>>>>>>>>> to each other (the number of TaskManagers is shown correctly in the 
>>>>>>>>>>> web
>>>>>>>>>>> interface).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Correct. Flink starts (i see the jobmanager UI) but the actual
>>>>>>>>>> job is not started.
>>>>>>>>>>
>>>>>>>>>> Niels Basjes
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>>>
>>>>>>>> Niels Basjes
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>
>>>>>> Niels Basjes
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards / Met vriendelijke groeten,
>>>>
>>>> Niels Basjes
>>>>
>>>
>>>
>>
>

Re: Running on a firewalled Yarn cluster?

Reply via email to