Hey Ravinder, check out the following config keys:
blob.server.port taskmanager.rpc.port taskmanager.data.port – Ufuk On Wed, Feb 10, 2016 at 4:06 PM, Ravinder Kaur <[email protected]> wrote: > Hello Fabian, > > Thank you very much for the resource. I had already gone through this and > have found port '6123' as default for taskmanager registration. But I want > to know the specific range of ports the taskmanager access during job > execution. > > The taskmanager always tries to access a random port during job execution > for which I need to disable firewall using 'ufw allow port' during the > execution, otherwise the job hangs and finally fails. So I wanted to know a > particular range of ports which I can specify in the iptables to always > allow access. > > > Kind Regards, > Ravinder Kaur > > On Wed, Feb 10, 2016 at 2:16 PM, Fabian Hueske <[email protected]> wrote: >> >> Hi Ravinder, >> >> please have a look at the configuration documentation: >> >> --> >> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#jobmanager-amp-taskmanager >> >> Best, Fabian >> >> 2016-02-10 13:55 GMT+01:00 Ravinder Kaur <[email protected]>: >>> >>> Hello All, >>> >>> I need to know the range of ports that are being used during the >>> master/slave communication in the Flink cluster. Also is there a way I can >>> specify a range of ports, at the slaves, to restrict them to connect to >>> master only in this range? >>> >>> Kind Regards, >>> Ravinder Kaur >>> >>> >>> On Wed, Feb 3, 2016 at 10:09 PM, Stephan Ewen <[email protected]> wrote: >>>> >>>> Can machines connect to port 6123? The firewall may block that port, put >>>> permit SSH. >>>> >>>> On Wed, Feb 3, 2016 at 9:52 PM, Ravinder Kaur <[email protected]> >>>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> Here is the log file of Jobmanager. I did not see some thing suspicious >>>>> and as it suggests the ports are also listening. >>>>> >>>>> 20:58:46,906 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - Starting JobManager on IP-of-master:6123 with execution mode CLUSTER and >>>>> streaming mode BATCH_ONLY >>>>> 20:58:46,978 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - Security is not enabled. Starting non-authenticated JobManager. >>>>> 20:58:46,979 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - Starting JobManager >>>>> 20:58:46,980 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - Starting JobManager actor system at 10.155.208.138:6123 >>>>> 20:58:48,196 INFO akka.event.slf4j.Slf4jLogger >>>>> - Slf4jLogger started >>>>> 20:58:48,295 INFO Remoting >>>>> - Starting remoting >>>>> 20:58:48,541 INFO Remoting >>>>> - Remoting started; listening on addresses >>>>> :[akka.tcp://flink@IP-of-master:6123] >>>>> 20:58:48,549 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - Starting JobManger web frontend >>>>> 20:58:48,690 INFO >>>>> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using >>>>> directory /tmp/flink-web-876a4755-4f38-4ff7-8202-f263afa9b986 for the web >>>>> interface files >>>>> 20:58:48,691 INFO >>>>> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Serving >>>>> job >>>>> manager log from >>>>> /home/flink/flink-0.10.1/log/flink-flink-jobmanager-0-hostname.log >>>>> 20:58:48,691 INFO >>>>> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Serving >>>>> job >>>>> manager stdout from >>>>> /home/flink/flink-0.10.1/log/flink-flink-jobmanager-0-hostname.out >>>>> 20:58:49,044 INFO >>>>> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Web >>>>> frontend >>>>> listening at 0:0:0:0:0:0:0:0:8081 >>>>> 20:58:49,045 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - Starting JobManager actor >>>>> 20:58:49,052 INFO org.apache.flink.runtime.blob.BlobServer >>>>> - Created BLOB server storage directory >>>>> /tmp/blobStore-e0c52bfb-2411-4a83-ac8d-5664a5894258 >>>>> 20:58:49,054 INFO org.apache.flink.runtime.blob.BlobServer >>>>> - Started BLOB server at 0.0.0.0:43683 - max concurrent requests: 50 - max >>>>> backlog: 1000 >>>>> 20:58:49,075 INFO org.apache.flink.runtime.jobmanager.MemoryArchivist >>>>> - Started memory archivist akka://flink/user/archive >>>>> 20:58:49,075 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - Starting JobManager at >>>>> akka.tcp://flink@IP-of-master:6123/user/jobmanager. >>>>> 20:58:49,081 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - JobManager akka.tcp://flink@IP-of-master:6123/user/jobmanager was >>>>> granted >>>>> leadership with leader session ID None. >>>>> 20:58:49,082 INFO >>>>> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Starting >>>>> with JobManager akka.tcp://flink@IP-of-master:6123/user/jobmanager on port >>>>> 8081 >>>>> 20:58:49,083 INFO >>>>> org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader >>>>> reachable under akka.tcp://flink@IP-of-master:6123/user/jobmanager:null. >>>>> 20:59:22,794 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - Submitting job 72733d69588678ec224003ab5577cab8 (Flink Java Job at Wed >>>>> Feb >>>>> 03 20:59:22 CET 2016). >>>>> 20:59:22,853 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - Scheduling job 72733d69588678ec224003ab5577cab8 (Flink Java Job at Wed >>>>> Feb >>>>> 03 20:59:22 CET 2016). >>>>> 20:59:22,857 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - Status of job 72733d69588678ec224003ab5577cab8 (Flink Java Job at Wed >>>>> Feb >>>>> 03 20:59:22 CET 2016) changed to RUNNING. >>>>> 20:59:22,859 INFO >>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN >>>>> DataSource (at getDefaultTextLineDataSet(WordCountData.java:70) >>>>> (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap >>>>> at >>>>> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72) >>>>> (1/1) >>>>> (23fb37019a504fd6c7bf95e46a8cd7a3) switched from CREATED to SCHEDULED >>>>> 20:59:22,881 INFO >>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN >>>>> DataSource (at getDefaultTextLineDataSet(WordCountData.java:70) >>>>> (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap >>>>> at >>>>> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72) >>>>> (1/1) >>>>> (23fb37019a504fd6c7bf95e46a8cd7a3) switched from SCHEDULED to CANCELED >>>>> 20:59:22,881 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - Status of job 72733d69588678ec224003ab5577cab8 (Flink Java Job at Wed >>>>> Feb >>>>> 03 20:59:22 CET 2016) changed to FAILING. >>>>> >>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: >>>>> Not enough free slots available to run the job. You can decrease the >>>>> operator parallelism or increase the number of slots per TaskManager in >>>>> the >>>>> configuration. Task to schedule: < Attempt #0 (CHAIN DataSource (at >>>>> getDefaultTextLineDataSet(WordCountData.java:70) >>>>> (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap >>>>> at >>>>> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72) >>>>> (1/1)) @ (unassigned) - [SCHEDULED] > with groupID < >>>>> 31e497f2f68c9cee5864c8fddaff3d59 > in sharing group < SlotSharingGroup >>>>> [f9ed1aab933e061a8ce1ecaa3534f18c, 037bb78a1902f7edea69a978ad7b54ce, >>>>> 31e497f2f68c9cee5864c8fddaff3d59] >. Resources available to scheduler: >>>>> Number of instances=0, total number of slots=0, available slots=0 >>>>> at >>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:256) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131) >>>>> at >>>>> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:298) >>>>> at >>>>> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:458) >>>>> at >>>>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:322) >>>>> at >>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:679) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) >>>>> at >>>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) >>>>> at >>>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) >>>>> at >>>>> akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) >>>>> at >>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) >>>>> at >>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>> at >>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>>> at >>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>> at >>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>> 20:59:22,886 INFO >>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN >>>>> Reduce >>>>> (SUM(1), at main(WordCount.java:72) -> FlatMap (collect()) (1/1) >>>>> (824b6e3771304cd0f92aea4ab763a11d) switched from CREATED to CANCELED >>>>> 20:59:22,887 INFO >>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - DataSink >>>>> (collect() sink) (1/1) (1bb64a2edc6f68ad716acd9f8d2d7d67) switched from >>>>> CREATED to CANCELED >>>>> 20:59:22,890 INFO org.apache.flink.runtime.jobmanager.JobManager >>>>> - Status of job 72733d69588678ec224003ab5577cab8 (Flink Java Job at Wed >>>>> Feb >>>>> 03 20:59:22 CET 2016) changed to FAILED. >>>>> >>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: >>>>> Not enough free slots available to run the job. You can decrease the >>>>> operator parallelism or increase the number of slots per TaskManager in >>>>> the >>>>> configuration. Task to schedule: < Attempt #0 (CHAIN DataSource (at >>>>> getDefaultTextLineDataSet(WordCountData.java:70) >>>>> (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap >>>>> at >>>>> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72) >>>>> (1/1)) @ (unassigned) - [SCHEDULED] > with groupID < >>>>> 31e497f2f68c9cee5864c8fddaff3d59 > in sharing group < SlotSharingGroup >>>>> [f9ed1aab933e061a8ce1ecaa3534f18c, 037bb78a1902f7edea69a978ad7b54ce, >>>>> 31e497f2f68c9cee5864c8fddaff3d59] >. Resources available to scheduler: >>>>> Number of instances=0, total number of slots=0, available slots=0 >>>>> at >>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:256) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131) >>>>> at >>>>> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:298) >>>>> at >>>>> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:458) >>>>> at >>>>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:322) >>>>> at >>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:679) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) >>>>> at >>>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) >>>>> at >>>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) >>>>> >>>>> >>>>> On Wed, Feb 3, 2016 at 9:27 PM, Robert Metzger <[email protected]> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> the TaskManager is starting up, but its not able to register at the >>>>>> job manager. Did you check the JobManager log? Do you see anything >>>>>> suspicious there? Are the ports matching? >>>>>> >>>>>> >>>>>> On Wed, Feb 3, 2016 at 9:23 PM, Ravinder Kaur <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Thank you for pointing it out. I had a little typo while I edited the >>>>>>> hostname in flink-conf.yaml. I've reset it and the TaskManager started >>>>>>> up. >>>>>>> But I still can't run the WordCount example and it throws the same >>>>>>> NoResourceAvaliableException. >>>>>>> >>>>>>> Caused by: >>>>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableExce >>>>>>> ption: Not enough free slots available to run the job. You can decrease >>>>>>> the >>>>>>> oper >>>>>>> ator parallelism or increase the number of slots per TaskManager in the >>>>>>> configur >>>>>>> ation. Task to schedule: < Attempt #0 (CHAIN DataSource (at >>>>>>> getDefaultTextLineDa >>>>>>> taSet(WordCountData.java:70) >>>>>>> (org.apache.flink.api.java.io.CollectionInputFormat >>>>>>> )) -> FlatMap (FlatMap at main(WordCount.java:69)) -> Combine(SUM(1), at >>>>>>> main(Wo >>>>>>> rdCount.java:72) (1/1)) @ (unassigned) - [SCHEDULED] > with groupID < >>>>>>> 31e497f2f6 >>>>>>> 8c9cee5864c8fddaff3d59 > in sharing group < SlotSharingGroup >>>>>>> [f9ed1aab933e061a8c >>>>>>> e1ecaa3534f18c, 037bb78a1902f7edea69a978ad7b54ce, >>>>>>> 31e497f2f68c9cee5864c8fddaff3d >>>>>>> 59] >. Resources available to scheduler: Number of instances=0, total >>>>>>> number >>>>>>> of >>>>>>> slots=0, available slots=0 >>>>>>> at >>>>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask( >>>>>>> Scheduler.java:256) >>>>>>> at >>>>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmed >>>>>>> iately(Scheduler.java:131) >>>>>>> at >>>>>>> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecutio >>>>>>> n(Execution.java:298) >>>>>>> at >>>>>>> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForEx >>>>>>> ecution(ExecutionVertex.java:458) >>>>>>> at >>>>>>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAl >>>>>>> l(ExecutionJobVertex.java:322) >>>>>>> at >>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExe >>>>>>> cution(ExecutionGraph.java:679) >>>>>>> at >>>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl >>>>>>> ink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982 >>>>>>> ) >>>>>>> at >>>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl >>>>>>> ink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) >>>>>>> at >>>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl >>>>>>> ink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) >>>>>>> ... 8 more >>>>>>> >>>>>>> The log of TaskManager again has the same errors as before. >>>>>>> >>>>>>> 20:58:58,457 INFO org.apache.flink.runtime.net.ConnectionUtils >>>>>>> - Failed to connect from address '/slave-IP': connect timed out >>>>>>> 20:58:58,458 INFO org.apache.flink.runtime.net.ConnectionUtils >>>>>>> - Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network is >>>>>>> unreachable >>>>>>> 20:58:58,458 INFO org.apache.flink.runtime.net.ConnectionUtils >>>>>>> - Failed to connect from address '/127.0.0.1': Invalid argument >>>>>>> 20:58:59,048 WARN org.apache.flink.runtime.net.ConnectionUtils >>>>>>> - Could not connect to /master-IP:6123. Selecting a local address using >>>>>>> heuristics. >>>>>>> 20:58:59,050 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - TaskManager will use hostname/address 'hostname-of-slave' (slave-IP) >>>>>>> for >>>>>>> communication. >>>>>>> 20:58:59,051 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Starting TaskManager in streaming mode BATCH_ONLY >>>>>>> 20:58:59,052 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Starting TaskManager actor system at slave_IP:0 >>>>>>> 20:58:59,776 INFO akka.event.slf4j.Slf4jLogger >>>>>>> - Slf4jLogger started >>>>>>> 20:58:59,842 INFO Remoting >>>>>>> - Starting remoting >>>>>>> 20:59:00,094 INFO Remoting >>>>>>> - Remoting started; listening on addresses >>>>>>> :[akka.tcp://flink@slave-IP:33813] >>>>>>> 20:59:00,100 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Starting TaskManager actor >>>>>>> 20:59:00,125 INFO >>>>>>> org.apache.flink.runtime.io.network.netty.NettyConfig - >>>>>>> NettyConfig >>>>>>> [server address: hostname-of-master/master-IP, server port: 49030, >>>>>>> memory >>>>>>> segment size (bytes): 32768, transport type: NIO, number of server >>>>>>> threads: >>>>>>> 0 (use Netty's default), number of client threads: 0 (use Netty's >>>>>>> default), >>>>>>> server connect backlog: 0 (use Netty's default), client connect timeout >>>>>>> (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)] >>>>>>> 20:59:00,131 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Messages between TaskManager and JobManager have a max timeout of >>>>>>> 100000 >>>>>>> milliseconds >>>>>>> 20:59:00,142 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Temporary file directory '/tmp': total 4 GB, usable 1 GB (25.00% >>>>>>> usable) >>>>>>> 20:59:00,210 INFO >>>>>>> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - >>>>>>> Allocated 64 >>>>>>> MB for network buffer pool (number of memory segments: 2048, bytes per >>>>>>> segment: 32768). >>>>>>> 20:59:00,323 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Using 0.7 of the currently free heap space for Flink managed heap >>>>>>> memory >>>>>>> (293 MB). >>>>>>> 20:59:00,565 INFO >>>>>>> org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O >>>>>>> manager >>>>>>> uses directory /tmp/flink-io-c7796b82-6676-4604-97fd-df09001a84e8 for >>>>>>> spill >>>>>>> files. >>>>>>> 20:59:00,578 INFO org.apache.flink.runtime.filecache.FileCache >>>>>>> - User file cache uses directory >>>>>>> /tmp/flink-dist-cache-13ed3e76-cf1e-46fa-9ba2-5177e801429e >>>>>>> 20:59:00,908 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Starting TaskManager actor at >>>>>>> akka://flink/user/taskmanager#-157676733. >>>>>>> 20:59:00,908 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - TaskManager data connection information: hostname-of-master >>>>>>> (dataPort=49030) >>>>>>> 20:59:00,909 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - TaskManager has 1 task slot(s). >>>>>>> 20:59:00,910 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Memory usage stats: [HEAP: 376/491/491 MB, NON HEAP: 24/49/304 MB >>>>>>> (used/committed/max)] >>>>>>> 20:59:00,917 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Trying to register at JobManager >>>>>>> akka.tcp://flink@master-IP:6123/user/jobmanager (attempt 1, timeout: 500 >>>>>>> milliseconds) >>>>>>> 20:59:01,443 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Trying to register at JobManager >>>>>>> akka.tcp://flink@master-IP:6123/user/jobmanager (attempt 2, timeout: >>>>>>> 1000 >>>>>>> milliseconds) >>>>>>> 20:59:02,873 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Trying to register at JobManager >>>>>>> akka.tcp://flink@master-IP:6123/user/jobmanager (attempt 3, timeout: >>>>>>> 2000 >>>>>>> milliseconds) >>>>>>> 20:59:04,893 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Trying to register at JobManager >>>>>>> akka.tcp://flink@master-IP:6123/user/jobmanager (attempt 4, timeout: >>>>>>> 4000 >>>>>>> milliseconds) >>>>>>> 20:59:08,914 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Trying to register at JobManager >>>>>>> akka.tcp://flink@master-IP:6123/user/jobmanager (attempt 5, timeout: >>>>>>> 8000 >>>>>>> milliseconds) >>>>>>> >>>>>>> >>>>>>> Kind Regards, >>>>>>> Ravinder Kaur >>>>>>> >>>>>>> On Wed, Feb 3, 2016 at 8:12 PM, Stephan Ewen <[email protected]> >>>>>>> wrote: >>>>>>>> >>>>>>>> This looks like the reason: >>>>>>>> >>>>>>>> java.net.UnknownHostException: Cannot resolve the JobManager >>>>>>>> hostname 'hostname-of-master' specified in the configuration >>>>>>>> >>>>>>>> On Wed, Feb 3, 2016 at 7:29 PM, Ravinder Kaur <[email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> The log file of the Taskmanager now shows the following >>>>>>>>> >>>>>>>>> 18:27:10,082 WARN org.apache.hadoop.util.NativeCodeLoader >>>>>>>>> - Unable to load native-hadoop library for your platform... using >>>>>>>>> builtin-java classes where applicable >>>>>>>>> 18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - >>>>>>>>> -------------------------------------------------------------------------------- >>>>>>>>> 18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - Starting TaskManager (Version: 0.10.1, Rev:2e9b231, >>>>>>>>> Date:22.11.2015 @ >>>>>>>>> 12:41:12 CET) >>>>>>>>> 18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - Current user: flink >>>>>>>>> 18:27:10,245 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.91-b01 >>>>>>>>> 18:27:10,245 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - Maximum heap size: 491 MiBytes >>>>>>>>> 18:27:10,245 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - JAVA_HOME: /usr/lib/jvm/java-1.7.0-openjdk-amd64 >>>>>>>>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - Hadoop version: 2.7.0 >>>>>>>>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - JVM Options: >>>>>>>>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - -Xms512M >>>>>>>>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - -Xmx512M >>>>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - -XX:MaxDirectMemorySize=8388607T >>>>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - -XX:MaxPermSize=256m >>>>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - >>>>>>>>> -Dlog.file=/home/flink/flink-0.10.1/log/flink-flink-taskmanager-0-vm-10-155-208-137.cloud.mwn.de.log >>>>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - >>>>>>>>> -Dlog4j.configuration=file:/home/flink/flink-0.10.1/conf/log4j.properties >>>>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - >>>>>>>>> -Dlogback.configurationFile=file:/home/flink/flink-0.10.1/conf/logback.xml >>>>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - Program Arguments: >>>>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - --configDir >>>>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - /home/flink/flink-0.10.1/conf >>>>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - --streamingMode >>>>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - batch >>>>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - Classpath: >>>>>>>>> /home/flink/flink-0.10.1/lib/flink-dist_2.11-0.10.1.jar:/home/flink/flink-0.10.1/lib/flink-python_2.11-0.10.1.jar:/home/flink/flink-0.10.1/lib/log4j-1.2.17.jar:/home/flink/flink-0.10.1/lib/slf4j-log4j12-1.7.7.jar:/usr/lib/jvm/java-1.7.0-openjdk-amd64/lib/tools.jar:: >>>>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - >>>>>>>>> -------------------------------------------------------------------------------- >>>>>>>>> 18:27:10,252 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - Maximum number of open file descriptors is 4096 >>>>>>>>> 18:27:10,277 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - Loading configuration from /home/flink/flink-0.10.1/conf >>>>>>>>> 18:27:10,356 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - Security is not enabled. Starting non-authenticated TaskManager. >>>>>>>>> 18:27:10,365 ERROR org.apache.flink.runtime.taskmanager.TaskManager >>>>>>>>> - Failed to run TaskManager. >>>>>>>>> java.net.UnknownHostException: Cannot resolve the JobManager >>>>>>>>> hostname 'hostname-of-master' specified in the configuration >>>>>>>>> at >>>>>>>>> org.apache.flink.runtime.util.StandaloneUtils.createLeaderRetrievalService(StandaloneUtils.java:79) >>>>>>>>> at >>>>>>>>> org.apache.flink.runtime.util.StandaloneUtils.createLeaderRetrievalService(StandaloneUtils.java:48) >>>>>>>>> at >>>>>>>>> org.apache.flink.runtime.util.LeaderRetrievalUtils.createLeaderRetrievalService(LeaderRetrievalUtils.java:69) >>>>>>>>> at >>>>>>>>> org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndPort(TaskManager.scala:1351) >>>>>>>>> at >>>>>>>>> org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1328) >>>>>>>>> at >>>>>>>>> org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1240) >>>>>>>>> at >>>>>>>>> org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala) >>>>>>>>> >>>>>>>>> Kind Regards, >>>>>>>>> Ravinder Kaur >>>>>>>>> >>>>>>>>> On Wed, Feb 3, 2016 at 7:19 PM, Stephan Ewen <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> What do the TaskManger logs say? >>>>>>>>>> >>>>>>>>>> On Wed, Feb 3, 2016 at 6:34 PM, Ravinder Kaur >>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> Thanks for the quick reply. I tried to set jobmanager.rpc.address >>>>>>>>>>> in flink-conf.yaml to the hostname of master node on both the nodes. >>>>>>>>>>> >>>>>>>>>>> Now it does not start the Taskmanager at the worker node at all. >>>>>>>>>>> When I start the cluster using ./bin/start-cluster.sh on master it >>>>>>>>>>> shows the >>>>>>>>>>> normal output of starting the Jobmanager and Taskmanager but when I >>>>>>>>>>> run jps >>>>>>>>>>> on the nodes the slave does not have the Taskmanager running. >>>>>>>>>>> >>>>>>>>>>> Running the WordCount example again fails showing the same error. >>>>>>>>>>> Stopping the cluster says no taskmanager to stop. >>>>>>>>>>> >>>>>>>>>>> Kind Regards, >>>>>>>>>>> Ravinder Kaur >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 3, 2016 at 5:47 PM, Stephan Ewen <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Looks like the network configuration is not correct. >>>>>>>>>>>> >>>>>>>>>>>> I would try setting the full host name (like >>>>>>>>>>>> "master.abc.xyz.com") as jobmanager.rpc.address. >>>>>>>>>>>> >>>>>>>>>>>> Greetings, >>>>>>>>>>>> Stephan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Feb 3, 2016 at 5:43 PM, Ravinder Kaur >>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hello Community, >>>>>>>>>>>>> >>>>>>>>>>>>> I'm a student and new to Apache Flink. I'm trying to learn and >>>>>>>>>>>>> have setup a 2- node standalone Flink(0.10.1) cluster (one master >>>>>>>>>>>>> and one >>>>>>>>>>>>> worker). I'm facing the following issue. >>>>>>>>>>>>> >>>>>>>>>>>>> Cluster: consists of 2 vms (one master and one worker) >>>>>>>>>>>>> >>>>>>>>>>>>> The configurations are done as per >>>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/cluster_setup.html >>>>>>>>>>>>> >>>>>>>>>>>>> When I start the cluster both the JobManager and the >>>>>>>>>>>>> TaskManager are started on the master and worker respectively. >>>>>>>>>>>>> >>>>>>>>>>>>> Command to start the cluster : bin/start-cluster.sh >>>>>>>>>>>>> >>>>>>>>>>>>> JPS shows all the processes running. >>>>>>>>>>>>> >>>>>>>>>>>>> Then I run the following command to run a WordCount example >>>>>>>>>>>>> job: ./bin/flink run ./examples/WordCount.jar >>>>>>>>>>>>> >>>>>>>>>>>>> the result is attached with the mail. >>>>>>>>>>>>> >>>>>>>>>>>>> The error is >>>>>>>>>>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailabeException: >>>>>>>>>>>>> Not enough free slots available to run to run the job >>>>>>>>>>>>> ....................... Resources available to scheduler: Number >>>>>>>>>>>>> of >>>>>>>>>>>>> instances=0, total number of slots= 0, available slots=0 >>>>>>>>>>>>> >>>>>>>>>>>>> Therefore I suppose that the JobManager does not find the >>>>>>>>>>>>> TaskManager and checked the logs of the TaskManager which indeed >>>>>>>>>>>>> shows that >>>>>>>>>>>>> the TaskManager is unable to register at the JobManager for quite >>>>>>>>>>>>> a long >>>>>>>>>>>>> time. There are org.apache.flink.runtime.net.ConnectionUtils: >>>>>>>>>>>>> Failed to >>>>>>>>>>>>> connect from localhost: Connect timed out and >>>>>>>>>>>>> org.apache.flink.runtime.net.ConnectionUtils: Failed to connect >>>>>>>>>>>>> from address >>>>>>>>>>>>> localhost: Network is Unreachable messages in the log of the >>>>>>>>>>>>> TaskManager. >>>>>>>>>>>>> Later when it starts up after a number of attempts and tries to >>>>>>>>>>>>> register at >>>>>>>>>>>>> the JobManager, which also fails after a lot of attempts showing >>>>>>>>>>>>> the >>>>>>>>>>>>> following message >>>>>>>>>>>>> org.apache.flink.runtime.taskmanager.Taskmanager: Trying >>>>>>>>>>>>> to register at JobManager >>>>>>>>>>>>> akka.tcp://flink@master:6123/user'/jobmanager >>>>>>>>>>>>> (attempt:92, timeout:30seconds) and >>>>>>>>>>>>> org.apache.flink.runtime.taskmanager.Taskmanager: Tried to >>>>>>>>>>>>> associate with >>>>>>>>>>>>> unreachable remote host >>>>>>>>>>>>> [akka.tcp://flink@master:6123/user/jobmanager]. >>>>>>>>>>>>> Address is now gated for 5000ms, all messages to this address >>>>>>>>>>>>> will be >>>>>>>>>>>>> delivered to dead letters. Reason: Connection timed out: >>>>>>>>>>>>> /master:6123 >>>>>>>>>>>>> >>>>>>>>>>>>> I browsed the internet for these and found >>>>>>>>>>>>> http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb >>>>>>>>>>>>> and https://issues.apache.org/jira/browse/FLINK-1119 these >>>>>>>>>>>>> links helpful. Stephan Ewen the guy who provided the solution in >>>>>>>>>>>>> both the >>>>>>>>>>>>> links gives a good explanation that the TaskManagers take quite >>>>>>>>>>>>> some time to >>>>>>>>>>>>> register at the JobManager and therefore I waited for as long as >>>>>>>>>>>>> 20 mins >>>>>>>>>>>>> after starting the cluster to run the job. But even after waiting >>>>>>>>>>>>> so long I >>>>>>>>>>>>> get the same error. >>>>>>>>>>>>> >>>>>>>>>>>>> Another suggestion was to run the cluster in streaming mode. So >>>>>>>>>>>>> I tried it with the command : bin/start-cluster-streaming.sh and >>>>>>>>>>>>> ran the job >>>>>>>>>>>>> but I get the same error. I have rechecked all the configurations >>>>>>>>>>>>> but I'm >>>>>>>>>>>>> unable to find out the fault. >>>>>>>>>>>>> >>>>>>>>>>>>> I re-checked all the configurations but could not find anything >>>>>>>>>>>>> wrong. Also checked the port 6123 on master which is in LISTEN >>>>>>>>>>>>> state and tcp >>>>>>>>>>>>> request from worker to master shows SYN_SENT state using netstat >>>>>>>>>>>>> -na and >>>>>>>>>>>>> lsof -i commands. >>>>>>>>>>>>> >>>>>>>>>>>>> I opened the webpage on master http://localhost:8081 but it >>>>>>>>>>>>> shows nothing and localhost:8080 says connection refused. >>>>>>>>>>>>> >>>>>>>>>>>>> Kindly help me out as it is very important for me. Let me know >>>>>>>>>>>>> if you have any questions. >>>>>>>>>>>>> >>>>>>>>>>>>> Kind Regards, >>>>>>>>>>>>> Ravinder Kaur >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
