Addition: To check whether a port is reachable, I think the easiest thing is to try and connect with a telnet client and see if the connection is refused.
On Wed, Feb 25, 2015 at 8:15 PM, Stephan Ewen <se...@apache.org> wrote: > Okay, the problem seems to be that even though both the client and the > jobmanager use "localhost" as the host name, they resolve this to different > IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146 > > Also, the 127.0.0.1 address cannot communicate to 10.216.177.146 > apparently. > > Can you help us debug this by checking the following: > > - Can you try and set "jobmanager.rpc.address" to 127.0.0.1 and see if > that solves it? > - Can you try and set "jobmanager.rpc.address" to the other address > (10.216.177.146 > or so) and see if that solves it? > - Can you do "start-cluster.sh", rather than "start-local.sh" and see > whether the webfrontend displays that the TaskManager connects? > - As a hard core test: Can you bring up the jobmanager, check where it > connects (10.216.192.98:6123 or so) and see whether the port is reachable? > > We have recently updated how the Akka URLs are build, to work around a > limitation in Akka. Seems that did not yet fully solve the issue. > > Thanks for helping us debug this, it is not the easiest immigration > experience, but the outcome is probably extremely valuable for the project > :-) > > Greetings, > Stephan > > > On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga <vidura...@icloud.com> > wrote: > >> Hi, >> Sorry for the delay to reply on this issue. >> the jobmanager.rpc.address is set to “localhost” already in conf.yaml. >> This can’t be an issue because the job manager web interface works fine >> which also runs on localhost >> >> bin/flink run <jar> doesn’t seem to work either. Let me send you my >> command and the result in terminal. >> >> bin/flink run >> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar >> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt >> $FLINK_DIRECTORY/count >> >> 20:32:16,442 WARN org.apache.hadoop.util.NativeCodeLoader >> - Unable to load native-hadoop library for your platform... using >> builtin-java classes where applicable >> org.apache.flink.client.program.ProgramInvocationException: Could not >> build up connection to JobManager. >> at org.apache.flink.client.program.Client.run(Client.java:327) >> at org.apache.flink.client.program.Client.run(Client.java:306) >> at org.apache.flink.client.program.Client.run(Client.java:300) >> at >> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) >> at >> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:483) >> at >> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) >> at >> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) >> at org.apache.flink.client.program.Client.run(Client.java:250) >> at >> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) >> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) >> at >> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) >> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) >> Caused by: java.io.IOException: JobManager at akka.tcp:// >> flink@10.216.177.146:6123/user/jobmanager not reachable. Please make >> sure that the JobManager is running and its port is reachable. >> at >> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897) >> at >> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151) >> at >> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142) >> at >> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125) >> at >> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala) >> at org.apache.flink.client.program.Client.run(Client.java:322) >> ... 15 more >> Caused by: java.util.concurrent.TimeoutException: Futures timed out after >> [10000 milliseconds] >> at >> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) >> at >> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) >> at >> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) >> at >> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) >> at scala.concurrent.Await$.result(package.scala:107) >> at >> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893) >> ... 20 more >> >> The exception above occurred while trying to run your command. >> >> >> > On Feb 25, 2015, at 1:29 AM, Stephan Ewen <se...@apache.org> wrote: >> > >> > BTW: Does still work if you enter "localhost" for >> "jobmanager.rpc.address" >> > in your flink-conf.yaml ? >> > >> > On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen <se...@apache.org> wrote: >> > >> >> Hi! >> >> >> >> I think that this is a problem in the current master (probably in there >> >> since a few days ago). I am fixing it... >> >> >> >> Thanks for reporting it! >> >> >> >> Stephan >> >> >> >> >> >> On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <se...@apache.org> >> wrote: >> >> >> >>> Hi Dulaj! >> >>> >> >>> The log suggests that the JobManager binds itself to the IP >> >>> address 10.216.192.98 and the WebClient runs at 127.0.0.1 >> >>> >> >>> The 127.0.0.1 actor system cannot connect to the 10.216.192.98. >> >>> >> >>> Let me verify whether this is a quirk of your particular setup, or a >> bug >> >>> recently introduces in the 0.9-SNAPSHOT. >> >>> >> >>> Does the command line work for you? ("bin/flink run <jar>") >> >>> >> >>> taskmanager.numberOfTaskSlots: -1 is also okay, this will mean that >> the >> >>> default of '1' is used. >> >>> >> >>> Greetings, >> >>> Stephan >> >>> >> >>> >> >>> >> >>> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga < >> vidura...@icloud.com> >> >>> wrote: >> >>> >> >>>> Is taskmanager.numberOfTaskSlots: -1 normal? >> >>>> >> >>>>> On Feb 24, 2015, at 9:44 PM, Robert Metzger <rmetz...@apache.org> >> >>>> wrote: >> >>>>> >> >>>>> Hi, >> >>>>> I could not find the logfiles attached to your mails. I think the >> >>>>> mailinglists are not accepting attachments. >> >>>>> Can you put the logs on gist.github.com? >> >>>>> >> >>>>> The configuration values are documented here: >> >>>>> http://flink.apache.org/docs/0.8/config.html >> >>>>> For the webclient's port its called webclient.port >> >>>>> >> >>>>> On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga < >> vidura...@icloud.com >> >>>>> >> >>>>> wrote: >> >>>>> >> >>>>>> I tried to kill the job manager manually in the terminal and start >> it >> >>>>>> again but no luck. Also could you tell me if it’s possible to >> change >> >>>>>> webclient’s port (8080) ? >> >>>>>> >> >>>>>>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <se...@apache.org> >> wrote: >> >>>>>>> >> >>>>>>> Hey Dulaj! >> >>>>>>> >> >>>>>>> As a contributor, I would go against the latest version, which is >> >>>>>>> 0.9-SNAPSHOT. >> >>>>>>> >> >>>>>>> It may be in your case that the JobManager actor is down, but the >> >>>> process >> >>>>>>> still lingers. (BTW: I have a patch pending that makes sure the >> >>>> process >> >>>>>>> disappears when the actor via down). >> >>>>>>> >> >>>>>>> Could you have a look at the log >> >>>> "flink-<user>-jobmanager-<host>-.log" >> >>>>>> and >> >>>>>>> see if there are any errors logged? >> >>>>>>> >> >>>>>>> Greetings, >> >>>>>>> Stephan >> >>>>>>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" < >> vidura...@icloud.com >> >>>>> : >> >>>>>>> >> >>>>>>>> The JobManager seems to run fine. I don't know. When I tried to >> run >> >>>>>>>> start-local.sh again, It shows the PID of the running JobManager >> and >> >>>>>> also >> >>>>>>>> :8081 runs fine. I want to contribute to the project and I could >> >>>> get a >> >>>>>>>> little boost if I could see the capabilities of FLINK. :) >> >>>>>>>> Will it be OK to use 0.8.1 as a developer? >> >>>>>>>> >> >>>>>>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <se...@apache.org> >> >>>> wrote: >> >>>>>>>> >> >>>>>>>> Hi Dulaj, >> >>>>>>>> >> >>>>>>>> That error message indicates that the JobManager is not running. >> >>>> Are you >> >>>>>>>> sure that the JobManager runs properly? Anything in the >> JobManager >> >>>> logs? >> >>>>>>>> >> >>>>>>>> BTW: The 0.9 branch is under heavy development / changes. That is >> >>>> why it >> >>>>>>>> may behave a bit different on different days right now. I would >> >>>>>> recommend >> >>>>>>>> to use the 0.8.1 release for a stable experience. >> >>>>>>>> >> >>>>>>>> Greetings, >> >>>>>>>> Stephan >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger < >> >>>> rmetz...@apache.org> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> Thank you for the quick reply. >> >>>>>>>> >> >>>>>>>> The log you've send is from the webclient. Can you also send the >> >>>> log of >> >>>>>> the >> >>>>>>>> >> >>>>>>>> JobManager? >> >>>>>>>> >> >>>>>>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga < >> >>>> vidura...@icloud.com> >> >>>>>>>> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Yes. It seams it is not a problem with the arguments. I tried >> two >> >>>> days >> >>>>>>>> >> >>>>>>>> but >> >>>>>>>> >> >>>>>>>>> different error occurs. It seams the web client can’t connect to >> >>>> the >> >>>>>> job >> >>>>>>>> >> >>>>>>>>> manager although it is running >> >>>>>>>> >> >>>>>>>>> Right now, I can’t even get the webclient to run. >> >>>>>>>> >> >>>>>>>> ./bin/start-webclient.sh >> >>>>>>>> >> >>>>>>>>> executes fine but I cannot connect to localhost:8080 (even with >> >>>> telnet >> >>>>>> or >> >>>>>>>> >> >>>>>>>>> curl) >> >>>>>>>> >> >>>>>>>>> Here is the log for jobManager >> >>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer >> >>>>>>>> >> >>>>>>>>> - Setting up web frontend server, using web-root directory >> >>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>> 'jar: >> >>>>>>>> >> >>>>>> >> >>>> >> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs >> >>>>>>>> '. >> >>>>>>>> >> >>>>>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer >> >>>>>>>> >> >>>>>>>>> - Web frontend server will store temporary files in >> >>>>>>>> >> >>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded >> jobs >> >>>> in >> >>>>>>>> >> >>>>>>>>> >> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs', >> >>>>>>>> >> >>>>>>>>> plan-json-dumps in >> >>>>>>>> >> >>>>>>>>> >> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'. >> >>>>>>>> >> >>>>>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer >> >>>>>>>> >> >>>>>>>>> - Web-frontend will submit jobs to nephele job-manager on >> >>>>>>>> >> >>>>>>>> localhost, >> >>>>>>>> >> >>>>>>>>> port 6123. >> >>>>>>>> >> >>>>>>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger >> >>>>>>>> >> >>>>>>>>> - Slf4jLogger started >> >>>>>>>> >> >>>>>>>>> 23:22:32,625 INFO Remoting >> >>>>>>>> >> >>>>>>>>> - Starting remoting >> >>>>>>>> >> >>>>>>>>> 23:22:32,838 INFO Remoting >> >>>>>>>> >> >>>>>>>>> - Remoting started; listening on addresses :[akka.tcp:// >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> JobsInfoServletActorSystem@127.0.0.1:51517] >> >>>>>>>> >> >>>>>>>>> 23:23:48,119 WARN Remoting >> >>>>>>>> >> >>>>>>>>> - Tried to associate with unreachable remote address >> [akka.tcp:// >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms, >> all >> >>>>>>>> >> >>>>>>>> messages >> >>>>>>>> >> >>>>>>>>> to this address will be delivered to dead letters. Reason: >> >>>> Operation >> >>>>>>>> >> >>>>>>>> timed >> >>>>>>>> >> >>>>>>>>> out: /10.218.98.169:6123 >> >>>>>>>> >> >>>>>>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend >> >>>>>>>> >> >>>>>>>>> - Unexpected exception: Could not find job manager at specified >> >>>>>>>> >> >>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp:// >> >>>>>>>> flink@10.218.98.169:6123/user/jobmanager. >> >>>>>>>> >> >>>>>>>>> java.lang.RuntimeException: Could not find job manager at >> specified >> >>>>>>>> >> >>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp:// >> >>>>>>>> flink@10.218.98.169:6123/user/jobmanager. >> >>>>>>>> >> >>>>>>>>> at >> >>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>> >> >>>> >> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82) >> >>>>>>>> >> >>>>>>>>> at >> >>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>> >> >>>> >> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158) >> >>>>>>>> >> >>>>>>>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74) >> >>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger < >> rmetz...@apache.org >> >>>>> >> >>>>>>>> >> >>>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>>> >> >>>>>>>> >> >>>>>>>>>> Hi, >> >>>>>>>> >> >>>>>>>>>> you said in the other email thread that the error only occurs >> for >> >>>>>>>> >> >>>>>>>>>> Wordcount, not for Kmeans. >> >>>>>>>> >> >>>>>>>>>> Can you copy me the commands for both examples? >> >>>>>>>> >> >>>>>>>>>> I can not really believe that there is a difference between the >> >>>> two >> >>>>>>>> >> >>>>>>>> jobs. >> >>>>>>>> >> >>>>>>>>>> >> >>>>>>>> >> >>>>>>>>>> Can you also send us the contents of the jobmanager log file? >> >>>>>>>> >> >>>>>>>>>> >> >>>>>>>> >> >>>>>>>>>> Best, >> >>>>>>>> >> >>>>>>>>>> Robert >> >>>>>>>> >> >>>>>>>>>> >> >>>>>>>> >> >>>>>>>>>> >> >>>>>>>> >> >>>>>>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga < >> >>>>>> vidura...@icloud.com >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>>> >> >>>>>>>> >> >>>>>>>>>>> I’m getting "Could not build up connection to JobManager.” >> When i >> >>>>>>>> >> >>>>>>>> tried >> >>>>>>>> >> >>>>>>>>> to >> >>>>>>>> >> >>>>>>>>>>> run the wordCount example. Can anyone help? >> >>>>>>>> >> >>>>>>>>>>> >> >>>>>>>> >> >>>>>>>>>>> Dulaj >> >>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>> >> >>>>>> >> >>>> >> >>>> >> >>> >> >> >> >> >