Thanks for the quick reply. Let me describe in more detail here. I am trying to submit a single Flink Job to YARN using the client -
./bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096 ./examples/batch/WordCount.jar In my understanding, YARN allocates a container for the Jobmanager. Jobmanager discovers the IP and started the Actor system. At this step the IP it uses is the internal IP address. FYI, the YARN and HDFS clusters are using the public DNS in all the configs. Is there a way, where I can set Jobmanager to use the hostname and not the IP address? Or any other suggestions? Thanks, Abhi From: <ewenstep...@gmail.com<mailto:ewenstep...@gmail.com>> on behalf of Stephan Ewen <se...@apache.org<mailto:se...@apache.org>> Reply-To: "user@flink.apache.org<mailto:user@flink.apache.org>" <user@flink.apache.org<mailto:user@flink.apache.org>> Date: Wednesday, March 9, 2016 at 6:09 AM To: "user@flink.apache.org<mailto:user@flink.apache.org>" <user@flink.apache.org<mailto:user@flink.apache.org>> Subject: Re: Submit Flink Jobs to YARN running on AWS Hi Abhi! You pretty much described it correctly: Flink binds its ports to the internal IP addresses, so you cannot send a message through the external IP addresses. Can you see if you can configure explicitly the external IP address as the JobManager hostname, so the JobManager will bind to that specific network interface? Stephan On Tue, Mar 8, 2016 at 12:25 AM, Bajaj, Abhinav <abhinav.ba...@here.com<mailto:abhinav.ba...@here.com>> wrote: Hi, I am a newbie to Flink and trying to use it in AWS. I have created a YARN cluster on AWS EC2 machines. Trying to submit Flink job to the remote YARN cluster using the Flink Client running on my local machine. The Jobmanager start successfully on the YARN container but the client is not able to connect to the Jobmanager. Flink Client Logs - 13:57:34,877 INFO org.apache.flink.yarn.FlinkYarnClient - Deploying cluster, current state ACCEPTED 13:57:35,951 INFO org.apache.flink.yarn.FlinkYarnClient - Deploying cluster, current state ACCEPTED 13:57:37,027 INFO org.apache.flink.yarn.FlinkYarnClient - YARN application has been deployed successfully. 13:57:37,100 INFO org.apache.flink.yarn.FlinkYarnCluster - Start actor system. 13:57:37,532 INFO org.apache.flink.yarn.FlinkYarnCluster - Start application client. YARN cluster started JobManager web interface address http://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8088/proxy/application_1456184947990_0003/ Waiting until all TaskManagers have connected 13:57:37,540 INFO org.apache.flink.yarn.ApplicationClient - Notification about new leader address akka.tcp://flink@54.35.41.12<mailto://flink@54.35.41.12>:41292/user/jobmanager with session ID null. No status updates from the YARN cluster received so far. Waiting ... 13:57:37,543 INFO org.apache.flink.yarn.ApplicationClient - Received address of new leader akka.tcp://flink@54.35.41.12<mailto://flink@54.35.41.12>:41292/user/jobmanager with session ID null. 13:57:37,543 INFO org.apache.flink.yarn.ApplicationClient - Disconnect from JobManager null. 13:57:37,545 INFO org.apache.flink.yarn.ApplicationClient - Trying to register at JobManager akka.tcp://flink@54.35.41.12<mailto://flink@54.35.41.12>:41292/user/jobmanager. No status updates from the YARN cluster received so far. Waiting ... The logs of the Jobmanager contains the following - 21:57:39,142 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@54.35.41.12:41292/]] arriving at [akka.tcp://flink@54.35.41.12:41292] inbound addresses are [akka.tcp://flink@172.31.23.18<mailto://flink@172.31.23.18>:41292] 21:57:40,782 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at ec2-54-35-41-12 (akka.tcp://flink@172.31.23.18<mailto://flink@172.31.23.18>:60565/user/taskmanager) as 72101dd2ee94caa7a5ec5a75488359aa. Current number of registered hosts is 1. Current number of alive task slots is 1. 21:57:41,162 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@54.35.41.12:41292/]] arriving at [akka.tcp://flink@54.35.41.12:41292] inbound addresses are [akka.tcp://flink@172.31.23.18<mailto://flink@172.31.23.18>:41292] It seems the problem is in the mismatch of the Jobmanager Akka actors system running address and the one user by the Client. 172.31.23.18 – is the internal private IP of the EC2 machine where the Jobmanager container is running. 54.35.41.12 – is the external IP of the EC2 machine, used by Flink client to submit the Job. Because of this mismatch the messages are ignored by the Akka actor System. Can someone please help me with this issue. I can share the detailed logs, if required. Thanks, Abhi