While discussing with my colleagues about the issue today, we came up with another approach to resolve the issue:
d) Upload the job jar to HDFS (or another FS) and trigger the execution of the jar using an HTTP request to the web interface. We could add some tooling into the /bin/flink client to submit a job like this transparently, so users would not need to bother with the file upload and request sending. Also, Sachin started a discussion on the dev@ list to add support for submitting jobs over the web interface, so maybe we can base the fix for FLINK-2960 on that. I've also looked into the Hadoop MapReduce code and it seems they do the following: When submitting a job, they are uploading the job jar file to HDFS. They also upload a configuration file that contains all the config options of the job. Then, they submit this altogether as an application to YARN. So far, there has not been any firewall involved. They establish a connection between the JobClient and the ApplicationMaster when the user is querying the current job status, but I could not find any special code getting the status over HTTP. But I found the following configuration parameter: "yarn.app.mapreduce.am.job.client.port-range", so it seems that they try to allocate the AM port within that range (if specified). Niels, can you check if this configuration parameter is set in your environment? I assume your firewall allows outside connections from that port range. So we also have a new approach: f) Allocate the YARN application master (and blob manager) within a user-specified port-range. This would be really easy to implement, because we would just need to go through the range until we find an available port. On Tue, Nov 3, 2015 at 1:06 PM, Niels Basjes <ni...@basjes.nl> wrote: > Great! > > I'll watch the issue and give it a test once I see a working patch. > > Niels Basjes > > On Tue, Nov 3, 2015 at 1:03 PM, Maximilian Michels <m...@apache.org> wrote: > >> Hi Niels, >> >> Thanks a lot for reporting this issue. I think it is a very common setup >> in corporate infrastructure to have restrictive firewall settings. For >> Flink 1.0 (and probably in a minor 0.10.X release) we will have to address >> this issue to ensure proper integration of Flink. >> >> I've created a JIRA to keep track: >> https://issues.apache.org/jira/browse/FLINK-2960 >> >> Best regards, >> Max >> >> On Tue, Nov 3, 2015 at 11:02 AM, Niels Basjes <ni...@basjes.nl> wrote: >> >>> Hi, >>> >>> I forgot to answer your other question: >>> >>> On Mon, Nov 2, 2015 at 4:34 PM, Robert Metzger <rmetz...@apache.org> >>> wrote: >>> >>>> so the problem is that you can not submit a job to Flink using the >>>> "/bin/flink" tool, right? >>>> I assume Flink and its TaskManagers properly start and connect to each >>>> other (the number of TaskManagers is shown correctly in the web interface). >>>> >>> >>> Correct. Flink starts (i see the jobmanager UI) but the actual job is >>> not started. >>> >>> Niels Basjes >>> >> >> > > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes >