Hi Abhi, I'll try to reproduce the issue and come up with a solution.
On Tue, Apr 26, 2016 at 1:13 AM, Bajaj, Abhinav <abhinav.ba...@here.com> wrote: > Hi Fabian, > > Thanks for your reply and the pointers to documentation. > > In these steps, I think the Flink client is installed on the master node, > referring to steps mentioned in Flink docs here > <https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/aws.html> > . > However, the scenario I have is to run the client on my local machine and > submit jobs remotely to the YARN Cluster (running on EMR or independently). > > Let me describe in more detail here. > I am trying to submit a single Flink Job to YARN using the client, running > on my dev machine - > > ./bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096 > ./examples/batch/WordCount.jar > > In my understanding, YARN (running in AWS) allocates a container for the > Jobmanager. > Jobmanager discovers the IP and started the Actor system. At this step the > IP it uses is the internal IP address, of the EC2 instance. > > The client, running on my dev machine, is not able to connect to the > Jobmanager for reasons explained in my mail below. > > Is there a way, where I can set Jobmanager to use the hostname and not the > IP address? > > Or any other suggestions? > > Thanks, > Abhi > > *[image: cid:DACBF116-FD8C-48DB-B91D-D54510B306E8]* > > *Abhinav Bajaj* > > Senior Engineer > > HERE Predictive Analytics > > Office: +12062092767 > > Mobile: +17083299516 > > *HERE Seattle* > > 701 Pike Street, #2000, Seattle, WA 98101, USA > > *47° 36' 41" N. 122° 19' 57" W* > > *HERE Maps* > > > > > From: Fabian Hueske <fhue...@gmail.com> > Reply-To: "user@flink.apache.org" <user@flink.apache.org> > Date: Wednesday, March 9, 2016 at 12:51 AM > To: "user@flink.apache.org" <user@flink.apache.org> > Subject: Re: Submit Flink Jobs to YARN running on AWS > > Hi Abhi, > > I have used Flink on EMR via YARN a couple of times without problems. > I started a Flink YARN session like this: > > ./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096 > > This will start five YARN containers (1 JobManager with 1024MB, 4 > Taskmanagers with 4096MB). See more config options in the documentation [1]. > In one of the last lines of the std-out output you should find a line that > tells you the IP and port of the JobManager. > > With the IP and port, you can submit a job as follows: > > ./bin/flink run -m jmIP:jmPort -p 4 jobJarFile.jar <arguments> > > This will send the job to the JobManager specified by IP and port and > execute the program with a parallelism of 4. See more config options in the > documentation [2]. > > If this does not help, could you share the exact command that you use to > start the YARN session and submit the job? > > Best, Fabian > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/yarn_setup.html > [2] > https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/cli.html > > 2016-03-08 0:25 GMT+01:00 Bajaj, Abhinav <abhinav.ba...@here.com>: > >> Hi, >> >> I am a newbie to Flink and trying to use it in AWS. >> I have created a YARN cluster on AWS EC2 machines. >> Trying to submit Flink job to the remote YARN cluster using the Flink >> Client running on my local machine. >> >> The Jobmanager start successfully on the YARN container but the client is >> not able to connect to the Jobmanager. >> >> Flink Client Logs - >> >> 13:57:34,877 INFO org.apache.flink.yarn.FlinkYarnClient >> - Deploying cluster, current state ACCEPTED >> 13:57:35,951 INFO org.apache.flink.yarn.FlinkYarnClient >> - Deploying cluster, current state ACCEPTED >> 13:57:37,027 INFO org.apache.flink.yarn.FlinkYarnClient >> - YARN application has been deployed successfully. >> 13:57:37,100 INFO org.apache.flink.yarn.FlinkYarnCluster >> - Start actor system. >> 13:57:37,532 INFO org.apache.flink.yarn.FlinkYarnCluster >> - Start application client. >> YARN cluster started >> JobManager web interface address >> http://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8088/proxy/application_1456184947990_0003/ >> Waiting until all TaskManagers have connected >> 13:57:37,540 INFO org.apache.flink.yarn.ApplicationClient >> - Notification about new leader address akka.tcp: >> //flink@54.35.41.12:41292/user/jobmanager with session ID null. >> No status updates from the YARN cluster received so far. Waiting ... >> 13:57:37,543 INFO org.apache.flink.yarn.ApplicationClient >> - Received address of new leader >> akka.tcp://flink@54.35.41.12:41292/user/jobmanager >> with session ID null. >> 13:57:37,543 INFO org.apache.flink.yarn.ApplicationClient >> - Disconnect from JobManager null. >> 13:57:37,545 INFO org.apache.flink.yarn.ApplicationClient >> - Trying to register at JobManager akka.tcp://flink@54.35.41.12 >> :41292/user/jobmanager. >> No status updates from the YARN cluster received so far. Waiting ... >> >> The logs of the Jobmanager contains the following - >> >> 21:57:39,142 ERROR akka.remote.EndpointWriter >> - dropping message [class akka.actor.ActorSelectionMessage] for >> non-local recipient [Actor[akka.tcp://flink@54.35.41.12:41292/]] arriving at >> [akka.tcp://flink@54.35.41.12:41292] inbound addresses are >> [akka.tcp://flink@172.31.23.18:41292] >> 21:57:40,782 INFO org.apache.flink.runtime.instance.InstanceManager >> - Registered TaskManager at ec2-54-35-41-12 >> (akka.tcp://flink@172.31.23.18:60565/user/taskmanager) as >> 72101dd2ee94caa7a5ec5a75488359aa. Current number of registered hosts is 1. >> Current number of alive task slots is 1. >> 21:57:41,162 ERROR akka.remote.EndpointWriter >> - dropping message [class akka.actor.ActorSelectionMessage] for >> non-local recipient [Actor[akka.tcp://flink@54.35.41.12:41292/]] arriving at >> [akka.tcp://flink@54.35.41.12:41292] inbound addresses are >> [akka.tcp://flink@172.31.23.18:41292] >> >> It seems the problem is in the mismatch of the Jobmanager Akka actors >> system running address and the one user by the Client. >> 172.31.23.18 – is the internal private IP of the EC2 machine where the >> Jobmanager container is running. >> 54.35.41.12 – is the external IP of the EC2 machine, used by Flink client >> to submit the Job. >> Because of this mismatch the messages are ignored by the Akka actor >> System. >> >> Can someone please help me with this issue. >> I can share the detailed logs, if required. >> >> Thanks, >> Abhi >> >> >