I'm also running into an issue with a non-YARN cluster. When submitting a JAR to Flink, we'll need to have an arbitrary port open on all of the hosts, which we don't know about until the socket attempts to bind; a bit of a problem for us.
Are there ways to submit a JAR to Flink that bypasses the need for the BlobServer's random port binding? Or, to control the port BlobServer binds to? Cheers, Cory On Thu, Nov 5, 2015 at 8:07 AM, Niels Basjes <ni...@basjes.nl> wrote: > That is what I tried. Couldn't find that port though. > > On Thu, Nov 5, 2015 at 3:06 PM, Robert Metzger <rmetz...@apache.org> > wrote: > >> Hi, >> >> cool, that's good news. >> >> The RM proxy is only for the web interface of the AM. >> >> I'm pretty sure that the MapReduce AM has at least two ports: >> - one for the web interface (accessible through the RM proxy, so behind >> the firewall) >> - one for the AM RPC (and that port is allocated within the configured >> range, open through the firewall). >> >> You can probably find the RPC port in the log file of the running >> MapReduce AM (to find that, identify the NodeManager running the AM, access >> the NM web interface and retrieve the logs of the container running the AM). >> >> Maybe the mapreduce client also logs the AM RPC port when querying the >> status of a running job. >> >> >> On Thu, Nov 5, 2015 at 2:59 PM, Niels Basjes <ni...@basjes.nl> wrote: >> >>> Hi, >>> >>> I checked and this setting has been set to a limited port range of only >>> 100 port numbers. >>> >>> I tried to find the actual port an AM is running on and couldn't find it >>> (I'm not the admin on that cluster) >>> >>> The url to the AM that I use to access it always looks like this: >>> >>> http://master-001.xxxxxx.net:8088/proxy/application_1443166961758_85492/index.html >>> >>> As you can see I never connect directly; always via the proxy that runs >>> over the master on a single fixed port. >>> >>> Niels >>> >>> On Thu, Nov 5, 2015 at 2:46 PM, Robert Metzger <rmetz...@apache.org> >>> wrote: >>> >>>> While discussing with my colleagues about the issue today, we came up >>>> with another approach to resolve the issue: >>>> >>>> d) Upload the job jar to HDFS (or another FS) and trigger the execution >>>> of the jar using an HTTP request to the web interface. >>>> >>>> We could add some tooling into the /bin/flink client to submit a job >>>> like this transparently, so users would not need to bother with the file >>>> upload and request sending. >>>> Also, Sachin started a discussion on the dev@ list to add support for >>>> submitting jobs over the web interface, so maybe we can base the fix for >>>> FLINK-2960 on that. >>>> >>>> I've also looked into the Hadoop MapReduce code and it seems they do >>>> the following: >>>> When submitting a job, they are uploading the job jar file to HDFS. >>>> They also upload a configuration file that contains all the config options >>>> of the job. Then, they submit this altogether as an application to YARN. >>>> So far, there has not been any firewall involved. They establish a >>>> connection between the JobClient and the ApplicationMaster when the user is >>>> querying the current job status, but I could not find any special code >>>> getting the status over HTTP. >>>> >>>> But I found the following configuration parameter: >>>> "yarn.app.mapreduce.am.job.client.port-range", so it seems that they try to >>>> allocate the AM port within that range (if specified). >>>> Niels, can you check if this configuration parameter is set in your >>>> environment? I assume your firewall allows outside connections from that >>>> port range. >>>> So we also have a new approach: >>>> >>>> f) Allocate the YARN application master (and blob manager) within a >>>> user-specified port-range. >>>> >>>> This would be really easy to implement, because we would just need to >>>> go through the range until we find an available port. >>>> >>>> >>>> On Tue, Nov 3, 2015 at 1:06 PM, Niels Basjes <ni...@basjes.nl> wrote: >>>> >>>>> Great! >>>>> >>>>> I'll watch the issue and give it a test once I see a working patch. >>>>> >>>>> Niels Basjes >>>>> >>>>> On Tue, Nov 3, 2015 at 1:03 PM, Maximilian Michels <m...@apache.org> >>>>> wrote: >>>>> >>>>>> Hi Niels, >>>>>> >>>>>> Thanks a lot for reporting this issue. I think it is a very common >>>>>> setup in corporate infrastructure to have restrictive firewall settings. >>>>>> For Flink 1.0 (and probably in a minor 0.10.X release) we will have to >>>>>> address this issue to ensure proper integration of Flink. >>>>>> >>>>>> I've created a JIRA to keep track: >>>>>> https://issues.apache.org/jira/browse/FLINK-2960 >>>>>> >>>>>> Best regards, >>>>>> Max >>>>>> >>>>>> On Tue, Nov 3, 2015 at 11:02 AM, Niels Basjes <ni...@basjes.nl> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I forgot to answer your other question: >>>>>>> >>>>>>> On Mon, Nov 2, 2015 at 4:34 PM, Robert Metzger <rmetz...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> so the problem is that you can not submit a job to Flink using the >>>>>>>> "/bin/flink" tool, right? >>>>>>>> I assume Flink and its TaskManagers properly start and connect to >>>>>>>> each other (the number of TaskManagers is shown correctly in the web >>>>>>>> interface). >>>>>>>> >>>>>>> >>>>>>> Correct. Flink starts (i see the jobmanager UI) but the actual job >>>>>>> is not started. >>>>>>> >>>>>>> Niels Basjes >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards / Met vriendelijke groeten, >>>>> >>>>> Niels Basjes >>>>> >>>> >>>> >>> >>> >>> -- >>> Best regards / Met vriendelijke groeten, >>> >>> Niels Basjes >>> >> >> > > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes >