Re: Could not build up connection to JobManager

Dulaj Viduranga Fri, 27 Feb 2015 21:21:39 -0800

Here is the taskmanager log when I tried taskmanager.sh start

flink-Vidura-taskmanager-localhost.log 
<https://gist.github.com/anonymous/aef5a0bf8722feee9b97#file-flink-vidura-taskmanager-localhost-log>


> On Feb 27, 2015, at 4:12 PM, Till Rohrmann <[email protected]> wrote:
> 
> It depends on how you started Flink. If you started a local cluster, then
> the TaskManager log is contained in the JobManager log we just don't see
> the respective log output in the snippet you posted. If you started a
> TaskManager independently, either by taskmanager.sh or by start-cluster.sh,
> then a file with the name format flink-<user>-taskmanager-<hostname>.log
> should be created in flink/log/. If the Flink directory is not shared  by
> your cluster nodes, then you have to look on the machine on which you
> started the TaskManager.
> 
> But since the JobManager binds to 127.0.0.1 I guess that you started a
> local cluster. Try whether you find some logging statements from the
> logger org.apache.flink.runtime.taskmanager.TaskManager in your log. Maybe
> you can upload the corresponding log file to [1] and post a link here.
> 
> Greets,
> 
> Till
> 
> [1] https://gist.github.com/
> 
> On Thu, Feb 26, 2015 at 6:45 PM, Dulaj Viduranga <[email protected]>
> wrote:
> 
>> Hi,
>>        Can you tell me where I can find TaskManager logs. I can’t find
>> them in logs folder? I don’t suppose I should run taskmanager.sh as well.
>> Right?
>>        I’m using a OS X Yosemite. I’ll send you my ifconfig.
>> 
>> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
>>        options=3<RXCSUM,TXCSUM>
>>        inet6 ::1 prefixlen 128
>>        inet 127.0.0.1 netmask 0xff000000
>>        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
>>        nd6 options=1<PERFORMNUD>
>> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
>> stf0: flags=0<> mtu 1280
>> en0: flags=8823<UP,BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500
>>        ether 60:03:08:a1:e0:f4
>>        nd6 options=1<PERFORMNUD>
>>        media: autoselect (<unknown type>)
>>        status: inactive
>> en1: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu
>> 1500
>>        options=60<TSO4,TSO6>
>>        ether 72:00:02:32:14:d0
>>        media: autoselect <full-duplex>
>>        status: inactive
>> en2: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu
>> 1500
>>        options=60<TSO4,TSO6>
>>        ether 72:00:02:32:14:d1
>>        media: autoselect <full-duplex>
>>        status: inactive
>> bridge0: flags=8822<BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500
>>        options=63<RXCSUM,TXCSUM,TSO4,TSO6>
>>        ether 62:03:08:1a:fa:00
>>        Configuration:
>>                id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0
>>                maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200
>>                root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0
>>                ipfilter disabled flags 0x2
>>        member: en1 flags=3<LEARNING,DISCOVER>
>>                ifmaxaddr 0 port 5 priority 0 path cost 0
>>        member: en2 flags=3<LEARNING,DISCOVER>
>>                ifmaxaddr 0 port 6 priority 0 path cost 0
>>        media: <unknown type>
>>        status: inactive
>> p2p0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 2304
>>        ether 02:03:08:a1:e0:f4
>>        media: autoselect
>>        status: inactive
>> awdl0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1452
>>        ether 06:56:3d:f6:60:08
>>        nd6 options=1<PERFORMNUD>
>>        media: autoselect
>>        status: inactive
>> ppp0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1500
>>        inet 10.218.98.228 --> 10.64.64.64 netmask 0xff000000
>> utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1380
>>        inet6 fe80::b0d4:d4be:7e62:e730%utun0 prefixlen 64 scopeid 0xb
>>        inet6 fdd0:b291:7da7:9153:b0d4:d4be:7e62:e730 prefixlen 64
>>        nd6 options=1<PERFORMNUD>
>> 
>> 
>>> On Feb 26, 2015, at 10:48 PM, Stephan Ewen <[email protected]> wrote:
>>> 
>>> Hi Dulaj!
>>> 
>>> Thanks for helping to debug.
>>> 
>>> My guess is that you are seeing now the same thing between JobManager and
>>> TaskManager as you saw before between JobManager and JobClient. I have a
>>> patch pending that should help the issue (see
>>> https://issues.apache.org/jira/browse/FLINK-1608), let's see if that
>> solves
>>> it.
>>> 
>>> What seems not right is that the JobManager initially accepted the
>>> TaskManager and later the communication. Can you paste the TaskManager
>> log
>>> as well?
>>> 
>>> Also: There must be something fairly unique about your network
>>> configuration, as it works on all other setups that we use (locally,
>> cloud,
>>> test servers, YARN, ...). Can you paste your ipconfig / ifconfig by any
>>> chance?
>>> 
>>> Greetings,
>>> Stephan
>>> 
>>> 
>>> 
>>> On Thu, Feb 26, 2015 at 4:33 PM, Dulaj Viduranga <[email protected]>
>>> wrote:
>>> 
>>>> Hi,
>>>>       It’s great to help out. :)
>>>> 
>>>>       Setting 127.0.0.1 instead of “localhost” in
>>>> jobmanager.rpc.address, helped to build the connection to the
>> jobmanager.
>>>> Apparently localhost resolving is different in webclient and the
>>>> jobmanager. I think it’s good to set "jobmanager.rpc.address:
>> 127.0.0.1" in
>>>> future builds.
>>>>       But then I get this error when I tried to run examples. I don’t
>>>> know if I should move this issue to another thread. If so please tell
>> me.
>>>> 
>>>> bin/flink run
>>>> 
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
>>>> 
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
>>>> $FLINK_DIRECTORY/count
>>>> 
>>>> 
>>>> 20:46:21,998 WARN  org.apache.hadoop.util.NativeCodeLoader
>>>>      - Unable to load native-hadoop library for your platform... using
>>>> builtin-java classes where applicable
>>>> 02/26/2015 20:46:23     Job execution switched to status RUNNING.
>>>> 02/26/2015 20:46:23     CHAIN DataSource (at
>>>> getTextDataSet(WordCount.java:141)
>>>> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at
>>>> main(WordCount.java:69)) -> Combine(SUM(1), at
>> main(WordCount.java:72)(1/1)
>>>> switched to SCHEDULED
>>>> 02/26/2015 20:46:23     CHAIN DataSource (at
>>>> getTextDataSet(WordCount.java:141)
>>>> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at
>>>> main(WordCount.java:69)) -> Combine(SUM(1), at
>> main(WordCount.java:72)(1/1)
>>>> switched to DEPLOYING
>>>> 02/26/2015 20:48:03     CHAIN DataSource (at
>>>> getTextDataSet(WordCount.java:141)
>>>> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at
>>>> main(WordCount.java:69)) -> Combine(SUM(1), at
>> main(WordCount.java:72)(1/1)
>>>> switched to FAILED
>>>> akka.pattern.AskTimeoutException: Ask timed out on
>>>> [Actor[akka://flink/user/taskmanager#-1628133761]] after [100000 ms]
>>>>       at
>>>> 
>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
>>>>       at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
>>>>       at
>>>> 
>> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
>>>>       at
>>>> 
>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
>>>>       at
>>>> 
>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
>>>>       at
>>>> 
>> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
>>>>       at
>>>> 
>> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
>>>>       at
>>>> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
>>>>       at java.lang.Thread.run(Thread.java:745)
>>>> 
>>>> 02/26/2015 20:48:03     Job execution switched to status FAILING.
>>>> 02/26/2015 20:48:03     Reduce (SUM(1), at main(WordCount.java:72)(1/1)
>>>> switched to CANCELED
>>>> 02/26/2015 20:48:03     DataSink(CsvOutputFormat (path:
>>>> 
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count,
>>>> delimiter:  ))(1/1) switched to CANCELED
>>>> 02/26/2015 20:48:03     Job execution switched to status FAILED.
>>>> org.apache.flink.client.program.ProgramInvocationException: The program
>>>> execution failed.
>>>>       at org.apache.flink.client.program.Client.run(Client.java:344)
>>>>       at org.apache.flink.client.program.Client.run(Client.java:306)
>>>>       at org.apache.flink.client.program.Client.run(Client.java:300)
>>>>       at
>>>> 
>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>>>>       at
>>>> 
>> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>       at
>>>> 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>       at
>>>> 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>       at java.lang.reflect.Method.invoke(Method.java:483)
>>>>       at
>>>> 
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>>>>       at
>>>> 
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>>>>       at org.apache.flink.client.program.Client.run(Client.java:250)
>>>>       at
>>>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>>>>       at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>>>>       at
>>>> 
>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>>>>       at
>> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
>>>> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
>>>> execution failed.
>>>>       at
>>>> 
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:284)
>>>>       at
>>>> 
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>>       at
>>>> 
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>>       at
>>>> 
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>>       at
>>>> 
>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
>>>>       at
>>>> 
>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
>>>>       at
>>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>>       at
>>>> 
>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
>>>>       at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>       at
>>>> 
>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:88)
>>>>       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>       at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>>       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>       at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>       at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>       at
>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>       at
>>>> 
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>       at
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>       at
>>>> 
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
>>>> [Actor[akka://flink/user/taskmanager#-1628133761]] after [100000 ms]
>>>>       at
>>>> 
>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
>>>>       at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
>>>>       at
>>>> 
>> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
>>>>       at
>>>> 
>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
>>>>       at
>>>> 
>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
>>>>       at
>>>> 
>> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
>>>>       at
>>>> 
>> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
>>>>       at
>>>> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
>>>>       at java.lang.Thread.run(Thread.java:745)
>>>> 
>>>> The exception above occurred while trying to run your command.
>>>> 
>>>> 
>>>>> On Feb 26, 2015, at 12:46 AM, Stephan Ewen <[email protected]> wrote:
>>>>> 
>>>>> Addition: To check whether a port is reachable, I think the easiest
>> thing
>>>>> is to try and connect with a telnet client and see if the connection is
>>>>> refused.
>>>>> 
>>>>> On Wed, Feb 25, 2015 at 8:15 PM, Stephan Ewen <[email protected]>
>> wrote:
>>>>> 
>>>>>> Okay, the problem seems to be that even though both the client and the
>>>>>> jobmanager use "localhost" as the host name, they resolve this to
>>>> different
>>>>>> IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146
>>>>>> 
>>>>>> Also, the 127.0.0.1 address cannot communicate to 10.216.177.146
>>>>>> apparently.
>>>>>> 
>>>>>> Can you help us debug this by checking the following:
>>>>>> 
>>>>>> - Can you try and set "jobmanager.rpc.address" to 127.0.0.1 and see if
>>>>>> that solves it?
>>>>>> - Can you try and set "jobmanager.rpc.address" to the other address
>>>> (10.216.177.146
>>>>>> or so) and see if that solves it?
>>>>>> - Can you do "start-cluster.sh", rather than "start-local.sh" and see
>>>>>> whether the webfrontend displays that the TaskManager connects?
>>>>>> - As a hard core test: Can you bring up the jobmanager, check where it
>>>>>> connects (10.216.192.98:6123 or so) and see whether the port is
>>>> reachable?
>>>>>> 
>>>>>> We have recently updated how the Akka URLs are build, to work around a
>>>>>> limitation in Akka. Seems that did not yet fully solve the issue.
>>>>>> 
>>>>>> Thanks for helping us debug this, it is not the easiest immigration
>>>>>> experience, but the outcome is probably extremely valuable for the
>>>> project
>>>>>> :-)
>>>>>> 
>>>>>> Greetings,
>>>>>> Stephan
>>>>>> 
>>>>>> 
>>>>>> On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga <
>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> Sorry for the delay to reply on this issue.
>>>>>>> the jobmanager.rpc.address is set to “localhost” already in
>> conf.yaml.
>>>>>>> This can’t be an issue because the job manager web interface works
>> fine
>>>>>>> which also runs on localhost
>>>>>>> 
>>>>>>> bin/flink run <jar> doesn’t seem to work either. Let me send you my
>>>>>>> command and the result in terminal.
>>>>>>> 
>>>>>>> bin/flink run
>>>>>>> 
>>>> 
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
>>>>>>> 
>>>> 
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
>>>>>>> $FLINK_DIRECTORY/count
>>>>>>> 
>>>>>>> 20:32:16,442 WARN  org.apache.hadoop.util.NativeCodeLoader
>>>>>>>     - Unable to load native-hadoop library for your platform...
>> using
>>>>>>> builtin-java classes where applicable
>>>>>>> org.apache.flink.client.program.ProgramInvocationException: Could not
>>>>>>> build up connection to JobManager.
>>>>>>>      at org.apache.flink.client.program.Client.run(Client.java:327)
>>>>>>>      at org.apache.flink.client.program.Client.run(Client.java:306)
>>>>>>>      at org.apache.flink.client.program.Client.run(Client.java:300)
>>>>>>>      at
>>>>>>> 
>>>> 
>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>>>>>>>      at
>>>>>>> 
>>>> 
>> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>>>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>      at
>>>>>>> 
>>>> 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>>      at
>>>>>>> 
>>>> 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>      at java.lang.reflect.Method.invoke(Method.java:483)
>>>>>>>      at
>>>>>>> 
>>>> 
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>>>>>>>      at
>>>>>>> 
>>>> 
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>>>>>>>      at org.apache.flink.client.program.Client.run(Client.java:250)
>>>>>>>      at
>>>>>>> 
>>>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>>>>>>>      at
>> org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>>>>>>>      at
>>>>>>> 
>>>> 
>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>>>>>>>      at
>>>> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
>>>>>>> Caused by: java.io.IOException: JobManager at akka.tcp://
>>>>>>> [email protected]:6123/user/jobmanager not reachable. Please make
>>>>>>> sure that the JobManager is running and its port is reachable.
>>>>>>>      at
>>>>>>> 
>>>> 
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897)
>>>>>>>      at
>>>>>>> 
>>>> 
>> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
>>>>>>>      at
>>>>>>> 
>>>> 
>> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
>>>>>>>      at
>>>>>>> 
>>>> 
>> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
>>>>>>>      at
>>>>>>> 
>>>> 
>> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
>>>>>>>      at org.apache.flink.client.program.Client.run(Client.java:322)
>>>>>>>      ... 15 more
>>>>>>> Caused by: java.util.concurrent.TimeoutException: Futures timed out
>>>> after
>>>>>>> [10000 milliseconds]
>>>>>>>      at
>>>>>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>>>>>>      at
>>>>>>> 
>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>>>>>>      at
>>>>>>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>>>>>>      at
>>>>>>> 
>>>> 
>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>>>>>>      at scala.concurrent.Await$.result(package.scala:107)
>>>>>>>      at
>>>>>>> 
>>>> 
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893)
>>>>>>>      ... 20 more
>>>>>>> 
>>>>>>> The exception above occurred while trying to run your command.
>>>>>>> 
>>>>>>> 
>>>>>>>> On Feb 25, 2015, at 1:29 AM, Stephan Ewen <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> BTW: Does still work if you enter "localhost" for
>>>>>>> "jobmanager.rpc.address"
>>>>>>>> in your flink-conf.yaml ?
>>>>>>>> 
>>>>>>>> On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen <[email protected]>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi!
>>>>>>>>> 
>>>>>>>>> I think that this is a problem in the current master (probably in
>>>> there
>>>>>>>>> since a few days ago). I am fixing it...
>>>>>>>>> 
>>>>>>>>> Thanks for reporting it!
>>>>>>>>> 
>>>>>>>>> Stephan
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <[email protected]>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Dulaj!
>>>>>>>>>> 
>>>>>>>>>> The log suggests that the JobManager binds itself to the IP
>>>>>>>>>> address 10.216.192.98 and the WebClient runs at 127.0.0.1
>>>>>>>>>> 
>>>>>>>>>> The 127.0.0.1 actor system cannot connect to the 10.216.192.98.
>>>>>>>>>> 
>>>>>>>>>> Let me verify whether this is a quirk of your particular setup,
>> or a
>>>>>>> bug
>>>>>>>>>> recently introduces in the 0.9-SNAPSHOT.
>>>>>>>>>> 
>>>>>>>>>> Does the command line work for you? ("bin/flink run <jar>")
>>>>>>>>>> 
>>>>>>>>>> taskmanager.numberOfTaskSlots: -1  is also okay, this will mean
>> that
>>>>>>> the
>>>>>>>>>> default of '1' is used.
>>>>>>>>>> 
>>>>>>>>>> Greetings,
>>>>>>>>>> Stephan
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga <
>>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Is taskmanager.numberOfTaskSlots: -1 normal?
>>>>>>>>>>> 
>>>>>>>>>>>> On Feb 24, 2015, at 9:44 PM, Robert Metzger <
>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> I could not find the logfiles attached to your mails. I think
>> the
>>>>>>>>>>>> mailinglists are not accepting attachments.
>>>>>>>>>>>> Can you put the logs on gist.github.com?
>>>>>>>>>>>> 
>>>>>>>>>>>> The configuration values are documented here:
>>>>>>>>>>>> http://flink.apache.org/docs/0.8/config.html
>>>>>>>>>>>> For the webclient's port its called webclient.port
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga <
>>>>>>> [email protected]
>>>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I tried to kill the job manager manually in the terminal and
>>>> start
>>>>>>> it
>>>>>>>>>>>>> again but no luck. Also could you tell me if it’s possible to
>>>>>>> change
>>>>>>>>>>>>> webclient’s port (8080) ?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[email protected]>
>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hey Dulaj!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> As a contributor, I would go against the latest version, which
>>>> is
>>>>>>>>>>>>>> 0.9-SNAPSHOT.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> It may be in your case that the JobManager actor is down, but
>>>> the
>>>>>>>>>>> process
>>>>>>>>>>>>>> still lingers. (BTW: I have a patch pending that makes sure
>> the
>>>>>>>>>>> process
>>>>>>>>>>>>>> disappears when the actor via down).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Could you have a look at the log
>>>>>>>>>>> "flink-<user>-jobmanager-<host>-.log"
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> see if there are any errors logged?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <
>>>>>>> [email protected]
>>>>>>>>>>>> :
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The JobManager seems to run fine. I don't know. When I tried
>> to
>>>>>>> run
>>>>>>>>>>>>>>> start-local.sh again, It shows the PID of the running
>>>> JobManager
>>>>>>> and
>>>>>>>>>>>>> also
>>>>>>>>>>>>>>> :8081 runs fine. I want to contribute to the project and I
>>>> could
>>>>>>>>>>> get a
>>>>>>>>>>>>>>> little boost if I could see the capabilities of FLINK. :)
>>>>>>>>>>>>>>> Will it be OK to use 0.8.1 as a developer?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[email protected]
>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Dulaj,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> That error message indicates that the JobManager is not
>>>> running.
>>>>>>>>>>> Are you
>>>>>>>>>>>>>>> sure that the JobManager runs properly? Anything in the
>>>>>>> JobManager
>>>>>>>>>>> logs?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> BTW: The 0.9 branch is under heavy development / changes.
>> That
>>>> is
>>>>>>>>>>> why it
>>>>>>>>>>>>>>> may behave a bit different on different days right now. I
>> would
>>>>>>>>>>>>> recommend
>>>>>>>>>>>>>>> to use the 0.8.1 release for a stable experience.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thank you for the quick reply.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The log you've send is from the webclient. Can you also send
>>>> the
>>>>>>>>>>> log of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> JobManager?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes. It seams it is not a problem with the arguments. I
>> tried
>>>>>>> two
>>>>>>>>>>> days
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> different error occurs. It seams the web client can’t
>> connect
>>>> to
>>>>>>>>>>> the
>>>>>>>>>>>>> job
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> manager although it is running
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Right now, I can’t even get the webclient to run.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ./bin/start-webclient.sh
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> executes fine but I cannot connect to localhost:8080 (even
>>>> with
>>>>>>>>>>> telnet
>>>>>>>>>>>>> or
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> curl)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Here is the log for jobManager
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 23:22:31,933 INFO
>>>> org.apache.flink.client.web.WebInterfaceServer
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - Setting up web frontend server, using web-root directory
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 'jar:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>> 
>> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
>>>>>>>>>>>>>>> '.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 23:22:31,934 INFO
>>>> org.apache.flink.client.web.WebInterfaceServer
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - Web frontend server will store temporary files in
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded
>>>>>>> jobs
>>>>>>>>>>> in
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> plan-json-dumps in
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 23:22:31,934 INFO
>>>> org.apache.flink.client.web.WebInterfaceServer
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - Web-frontend will submit jobs to nephele job-manager on
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> localhost,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> port 6123.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - Slf4jLogger started
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 23:22:32,625 INFO Remoting
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - Starting remoting
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 23:22:32,838 INFO Remoting
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - Remoting started; listening on addresses :[akka.tcp://
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> [email protected]:51517]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 23:23:48,119 WARN Remoting
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - Tried to associate with unreachable remote address
>>>>>>> [akka.tcp://
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> [email protected]:6123]. Address is now gated for 5000
>> ms,
>>>>>>> all
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> to this address will be delivered to dead letters. Reason:
>>>>>>>>>>> Operation
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> timed
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> out: /10.218.98.169:6123
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - Unexpected exception: Could not find job manager at
>>>> specified
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> address [email protected]:6123/user/jobmanager
>> '>tcp://
>>>>>>>>>>>>>>> [email protected]:6123/user/jobmanager.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> java.lang.RuntimeException: Could not find job manager at
>>>>>>> specified
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> address [email protected]:6123/user/jobmanager
>> '>tcp://
>>>>>>>>>>>>>>> [email protected]:6123/user/jobmanager.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> at
>>>> org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <
>>>>>>> [email protected]
>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> you said in the other email thread that the error only
>> occurs
>>>>>>> for
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Wordcount, not for Kmeans.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Can you copy me the commands for both examples?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I can not really believe that there is a difference between
>>>> the
>>>>>>>>>>> two
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> jobs.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Can you also send us the contents of the jobmanager log
>> file?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Robert
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I’m getting "Could not build up connection to JobManager.”
>>>>>>> When i
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> tried
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> run the wordCount example. Can anyone help?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Dulaj
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Could not build up connection to JobManager

Reply via email to