Hi Ned, I think you are facing the issue described in this JIRA issue [1]. The problem is that you have a private and a public IP address and that Akka binds to the private IP address. Since the registered IP of an ActorSystem and the target IP address of a request to this ActorSystem have to be matching, you cannot reach the ActorSystem via the public IP address. Requests with a non-matching IP address are discarded, as indicated by the last log statements.
[1] https://issues.apache.org/jira/browse/FLINK-2821 Cheers, Till On Thu, Apr 14, 2016 at 10:41 AM, ned dogg <neddog...@gmail.com> wrote: > 2016-04-14 08:23:51,900 INFO > org.apache.flink.runtime.jobmanager.JobManager - > > -------------------------------------------------------------------------------- > 2016-04-14 08:23:51,902 INFO > org.apache.flink.runtime.jobmanager.JobManager - Starting > JobManager (Version: 1.0.1, Rev:4afa401, Date:31.03.2016 @ 13:40:33 UTC) > 2016-04-14 08:23:51,902 INFO > org.apache.flink.runtime.jobmanager.JobManager - Current > user: root > 2016-04-14 08:23:51,902 INFO > org.apache.flink.runtime.jobmanager.JobManager - JVM: > OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.95-b01 > 2016-04-14 08:23:51,902 INFO > org.apache.flink.runtime.jobmanager.JobManager - Maximum > heap size: 247 MiBytes > 2016-04-14 08:23:51,902 INFO > org.apache.flink.runtime.jobmanager.JobManager - > JAVA_HOME: /usr/lib/jvm/java-7-openjdk-amd64 > 2016-04-14 08:23:51,929 INFO > org.apache.flink.runtime.jobmanager.JobManager - Hadoop > version: 1.2.1 > 2016-04-14 08:23:51,929 INFO > org.apache.flink.runtime.jobmanager.JobManager - JVM > Options: > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - > -Xms256m > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - > -Xmx256m > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - > -XX:MaxPermSize=256m > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - > > -Dlog.file=/home/ubuntu/flink/log/flink-root-jobmanager-0-tresor-testflinkth.1541841583.streamly.com.log > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - > -Dlog4j.configuration=file:/home/ubuntu/flink/conf/log4j.properties > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - > -Dlogback.configurationFile=file:/home/ubuntu/flink/conf/logback.xml > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - Program > Arguments: > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - > --configDir > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - > /home/ubuntu/flink/conf > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - > --executionMode > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - > cluster > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - > Classpath: > > /home/ubuntu/flink/lib/flink-dist_2.10-1.0.1.jar:/home/ubuntu/flink/lib/flink-python_2.10-1.0.1.jar:/home/ubuntu/flink/lib/log4j-1.2.17.jar:/home/ubuntu/flink/lib/slf4j-log4j12-1.7.7.jar::: > 2016-04-14 08:23:51,930 INFO > org.apache.flink.runtime.jobmanager.JobManager - > > -------------------------------------------------------------------------------- > 2016-04-14 08:23:51,931 INFO > org.apache.flink.runtime.jobmanager.JobManager - Registered > UNIX signal handlers for [TERM, HUP, INT] > 2016-04-14 08:23:52,362 INFO > org.apache.flink.runtime.jobmanager.JobManager - Loading > configuration from /home/ubuntu/flink/conf > 2016-04-14 08:23:52,400 INFO > org.apache.flink.runtime.jobmanager.JobManager - Starting > JobManager without high-availability > 2016-04-14 08:23:52,408 INFO > org.apache.flink.runtime.jobmanager.JobManager - Starting > JobManager on 172.31.45.232:6123 with execution mode CLUSTER > 2016-04-14 08:23:52,655 INFO > org.apache.flink.runtime.jobmanager.JobManager - Security > is not enabled. Starting non-authenticated JobManager. > 2016-04-14 08:23:52,701 INFO > org.apache.flink.runtime.jobmanager.JobManager - Starting > JobManager > 2016-04-14 08:23:52,701 INFO > org.apache.flink.runtime.jobmanager.JobManager - Starting > JobManager actor system at 172.31.45.232:6123 > 2016-04-14 08:23:54,091 INFO akka.event.slf4j.Slf4jLogger > - Slf4jLogger started > 2016-04-14 08:23:54,293 INFO Remoting > - Starting remoting > 2016-04-14 08:23:54,712 INFO Remoting > - Remoting started; listening on addresses :[akka.tcp:// > flink@172.31.45.232:6123] > 2016-04-14 08:23:54,732 INFO > org.apache.flink.runtime.jobmanager.JobManager - Starting > JobManger web frontend > 2016-04-14 08:23:54,777 INFO > org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined > location of JobManager log file: > > /home/ubuntu/flink/log/flink-root-jobmanager-0-tresor-testflinkth.1541841583.streamly.com.log > 2016-04-14 08:23:54,777 INFO > org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined > location of JobManager stdout file: > > /home/ubuntu/flink/log/flink-root-jobmanager-0-tresor-testflinkth.1541841583.streamly.com.out > 2016-04-14 08:23:54,805 INFO > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using > directory /tmp/flink-web-dadb12c4-4ac9-42d6-b127-712db44b4add for the web > interface files > 2016-04-14 08:23:54,805 INFO > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using > directory /tmp/flink-web-upload-88dc8823-f114-4787-a6f4-f1955380e384 for > web frontend JAR file uploads > 2016-04-14 08:23:55,600 INFO > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Web > frontend listening at 0:0:0:0:0:0:0:0:8081 > 2016-04-14 08:23:55,601 INFO > org.apache.flink.runtime.jobmanager.JobManager - Starting > JobManager actor > 2016-04-14 08:23:55,619 INFO org.apache.flink.runtime.blob.BlobServer > - Created BLOB server storage directory > /tmp/blobStore-bbc7f33e-fa65-41f8-8c68-549e9707fd56 > 2016-04-14 08:23:55,634 INFO org.apache.flink.runtime.blob.BlobServer > - Started BLOB server at 0.0.0.0:60439 - max concurrent > requests: 50 - max backlog: 1000 > 2016-04-14 08:23:55,653 INFO > org.apache.flink.runtime.checkpoint.SavepointStoreFactory - Using job > manager savepoint state backend. > 2016-04-14 08:23:55,678 INFO > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Starting > with JobManager akka.tcp://flink@172.31.45.232:6123/user/jobmanager on > port > 8081 > 2016-04-14 08:23:55,678 INFO > org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader > reachable under akka.tcp://flink@172.31.45.232:6123/user/jobmanager:null. > 2016-04-14 08:23:55,692 INFO > org.apache.flink.runtime.jobmanager.JobManager - Starting > JobManager at akka.tcp://flink@172.31.45.232:6123/user/jobmanager. > 2016-04-14 08:23:55,696 INFO > org.apache.flink.runtime.jobmanager.MemoryArchivist - Started > memory archivist akka://flink/user/archive > 2016-04-14 08:23:55,702 INFO > org.apache.flink.runtime.jobmanager.JobManager - JobManager > akka.tcp://flink@172.31.45.232:6123/user/jobmanager was granted leadership > with leader session ID None. > 2016-04-14 08:24:12,740 INFO > org.apache.flink.runtime.instance.InstanceManager - Registered > TaskManager at tresor-testflinkth (akka.tcp:// > flink@172.31.45.130:42189/user/taskmanager) as > 734b6b21dd60760c7a05722db3c3d1c4. Current number of registered hosts is 1. > Current number of alive task slots is 1. > 2016-04-14 08:24:17,855 INFO > org.apache.flink.runtime.instance.InstanceManager - Registered > TaskManager at tresor-testflinkth (akka.tcp:// > flink@172.31.34.121:58814/user/taskmanager) as > fead50f6831aa3f341d58162bb918d90. Current number of registered hosts is 2. > Current number of alive task slots is 2. > 2016-04-14 08:29:50,346 ERROR akka.remote.EndpointWriter > - dropping message [class > akka.actor.ActorSelectionMessage] for non-local recipient > [Actor[akka.tcp:// > flink@54.233.183.228:6123/]] arriving at [akka.tcp:// > flink@54.233.183.228:6123] inbound addresses are [akka.tcp:// > flink@172.31.45.232:6123] > 2016-04-14 08:29:59,777 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system [akka.tcp:// > flink@127.0.0.1:35953] has failed, address is now gated for [5000] ms. > Reason is: [Disassociated]. > > > Where 54.233.183.228 <http://flink@54.233.183.228:6123> is the public > address of the VM hosting job manager and 172.31.45.232 > <http://flink@172.31.45.232:6123> is it private address. > 172.31.45.130 <http://flink@172.31.45.130:42189/user/taskmanager> and > 172.31.34.121 <http://flink@172.31.34.121:58814/user/taskmanager> are the > private address of the task manager > > On Thu, Apr 14, 2016 at 9:05 AM, Till Rohrmann <trohrm...@apache.org> > wrote: > > > I'm referring to the jobmanager.log file not the client log file. You can > > find it in the `/log` directory. > > > > Cheers, > > Till > > > > On Thu, Apr 14, 2016 at 9:56 AM, ned dogg <neddog...@gmail.com> wrote: > > > > > Hi Till > > > > > > Thanks for the prompt reply. > > > > > > The logs say that Please make sure that the actor is running and its > port > > > is reachable. > > > And it is actaully reachable because I can ping that address. > > > > > > Ned. > > > > > > On Thu, Apr 14, 2016 at 8:43 AM, Till Rohrmann < > till.rohrm...@gmail.com> > > > wrote: > > > > > > > Hi Ned, > > > > > > > > what does the logs of the JobManager say? > > > > > > > > Cheers, > > > > Till > > > > On Apr 14, 2016 9:19 AM, "ned dogg" <neddog...@gmail.com> wrote: > > > > > > > > > Hi everybody, > > > > > > > > > > I'm Ned, a young and passionte developer of apache technologies. I > > have > > > > > been playing with apache flink lastly. > > > > > > > > > > This is what I wanted to do submit a flink topology to a remote > flink > > > > > cluster. The following are the steps that I did. > > > > > > > > > > - Install flink as a cluster indicated on the link > > > > > > > > > > > > > > > > > > > > https://ci.apache.org/projects/flink/flink-docs-master/setup/cluster_setup.html > > > > > on three remotes VMs. > > > > > - Run the sample WordCountRemoteByClient > > > > > < > > > > > > > > > > > > > > > https://github.com/apache/flink/blob/master/flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/wordcount/WordCountRemoteByClient.java > > > > > > > > > > > by > > > > > changing > > > > > conf.put(Config.NIMBUS_HOST, "localhost"); to > > > > > conf.put(Config.NIMBUS_HOST, > > "publicIpOfJobmanagerInMyRemoteCluster"); > > > > > > > > > > Unfortunately for me when I run that program, I have a the > following > > > > > exception. > > > > > > > > > > org.apache.flink.client.program.ProgramInvocationException: The > main > > > > method > > > > > caused an error. > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:520) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:403) > > > > > at > > org.apache.flink.client.program.Client.runBlocking(Client.java:248) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:866) > > > > > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333) > > > > > at > > > > > > > > > > > > > > > org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1189) > > > > > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1239) > > > > > Caused by: java.lang.RuntimeException: Could not connect to Flink > > > > > JobManager with address publicIpOfJobmanagerInMyRemoteCluster:6123 > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.storm.api.FlinkClient.getTopologyJobId(FlinkClient.java:305) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.storm.api.FlinkClient.submitTopologyWithOpts(FlinkClient.java:177) > > > > > at > > > > > > > > > > > > > > > org.apache.flink.storm.api.FlinkClient.submitTopology(FlinkClient.java:167) > > > > > at > > > > > > > > > > > > > > > > > > > > stormWorldCount.WordCountRemoteByClient.main(WordCountRemoteByClient.java:72) > > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > > > at > > > > > > > > > > > > > > > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > > > > at > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > > > at java.lang.reflect.Method.invoke(Method.java:483) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:505) > > > > > ... 6 more > > > > > Caused by: java.io.IOException: Actor at akka.tcp://flink@ > > > > > publicIpOfJobmanagerInMyRemoteCluster:6123/user/jobmanager not > > > reachable. > > > > > Please make sure that the actor is running and its port is > reachable. > > > > > at > > > > > > > > > org.apache.flink.runtime.akka.AkkaUtils$.getActorRef(AkkaUtils.scala:384) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerActorRef(JobManager.scala:2380) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerActorRef(JobManager.scala:2400) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.jobmanager.JobManager.getJobManagerActorRef(JobManager.scala) > > > > > at > > > > > > > > > > > > > > > org.apache.flink.storm.api.FlinkClient.getJobManager(FlinkClient.java:333) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.storm.api.FlinkClient.getTopologyJobId(FlinkClient.java:279) > > > > > ... 14 more > > > > > Caused by: java.util.concurrent.TimeoutException: Futures timed out > > > after > > > > > [10000 milliseconds] > > > > > at > > > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > > > > > at > > > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > > > > > at > scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > > > > > at > > > > > > > > > > > > > > > > > > > > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > > > > > at scala.concurrent.Await$.result(package.scala:107) > > > > > at > > > > > > > > > org.apache.flink.runtime.akka.AkkaUtils$.getActorRef(AkkaUtils.scala:380) > > > > > ... 19 more > > > > > > > > > > I try ping my jobmanager with > > > > > curl publicIpOfJobmanagerInMyRemoteCluster:6123 I had the following > > as > > > > > responces. > > > > > > > > > > curl: (52) Empty reply from server > > > > > > > > > > Which is an indication that the job manager is reachable. > > > > > > > > > > So I was wondering if I doing it the right way. Please any help > will > > be > > > > > welcoming. > > > > > > > > > > Thanks, > > > > > Ned > > > > > > > > > > > > > > >