Hello,
Thank you for pointing it out. I had a little typo while I edited the
hostname in flink-conf.yaml. I've reset it and the TaskManager started up.
But I still can't run the WordCount example and it throws the same
NoResourceAvaliableException.
Caused by:
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableExce
ption: Not enough free slots available to run the job. You can
decrease the oper
ator parallelism or increase the number of
slots per TaskManager in the configur
ation. Task to schedule: <
Attempt #0 (CHAIN DataSource (at getDefaultTextLineDa
taSet(WordCountData.java:70)
(org.apache.flink.api.java.io.CollectionInputFormat
)) ->
FlatMap (FlatMap at main(WordCount.java:69)) -> Combine(SUM(1), at main(Wo
rdCount.java:72) (1/1)) @ (unassigned) - [SCHEDULED] > with
groupID < 31e497f2f6
8c9cee5864c8fddaff3d59 > in sharing group
< SlotSharingGroup [f9ed1aab933e061a8c
e1ecaa3534f18c,
037bb78a1902f7edea69a978ad7b54ce, 31e497f2f68c9cee5864c8fddaff3d
59] >. Resources available to scheduler: Number of instances=0, total
number of
slots=0, available slots=0
at
org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(
Scheduler.java:256)
at
org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmed
iately(Scheduler.java:131)
at
org.apache.flink.runtime.executiongraph.Execution.scheduleForExecutio
n(Execution.java:298)
at
org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForEx
ecution(ExecutionVertex.java:458)
at
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAl
l(ExecutionJobVertex.java:322)
at
org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExe
cution(ExecutionGraph.java:679)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl
ink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982
)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl
ink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl
ink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962)
... 8 more
The log of TaskManager again has the same errors as before.
20:58:58,457 INFO org.apache.flink.runtime.net.ConnectionUtils
- Failed to connect from address '/slave-IP': connect timed out
20:58:58,458 INFO org.apache.flink.runtime.net.ConnectionUtils
- Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network is
unreachable
20:58:58,458 INFO org.apache.flink.runtime.net.ConnectionUtils
- Failed to connect from address '/127.0.0.1': Invalid argument
20:58:59,048 WARN org.apache.flink.runtime.net.ConnectionUtils
- Could not connect to /master-IP:6123. Selecting a local address
using heuristics.
20:58:59,050 INFO org.apache.flink.runtime.taskmanager.TaskManager
- TaskManager will use hostname/address 'hostname-of-slave' (slave-IP)
for communication.
20:58:59,051 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Starting TaskManager in streaming mode BATCH_ONLY
20:58:59,052 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Starting TaskManager actor system at slave_IP:0
20:58:59,776 INFO akka.event.slf4j.Slf4jLogger
- Slf4jLogger started
20:58:59,842 INFO Remoting
- Starting remoting
20:59:00,094 INFO Remoting
- Remoting started; listening on addresses :[akka.tcp://flink@slave-IP
:33813]
20:59:00,100 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Starting TaskManager actor
20:59:00,125 INFO org.apache.flink.runtime.io.network.netty.NettyConfig
- NettyConfig [server address: hostname-of-master/master-IP, server
port: 49030, memory segment size (bytes): 32768, transport type: NIO,
number of server threads: 0 (use Netty's default), number of client
threads: 0 (use Netty's default), server connect backlog: 0 (use Netty's
default), client connect timeout (sec): 120, send/receive buffer size
(bytes): 0 (use Netty's default)]
20:59:00,131 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Messages between TaskManager and JobManager have a max timeout of
100000 milliseconds
20:59:00,142 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Temporary file directory '/tmp': total 4 GB, usable 1 GB (25.00%
usable)
20:59:00,210 INFO
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated
64 MB for network buffer pool (number of memory segments: 2048, bytes per
segment: 32768).
20:59:00,323 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Using 0.7 of the currently free heap space for Flink managed heap
memory (293 MB).
20:59:00,565 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager
- I/O manager uses directory
/tmp/flink-io-c7796b82-6676-4604-97fd-df09001a84e8 for spill files.
20:59:00,578 INFO org.apache.flink.runtime.filecache.FileCache
- User file cache uses directory
/tmp/flink-dist-cache-13ed3e76-cf1e-46fa-9ba2-5177e801429e
20:59:00,908 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Starting TaskManager actor at
akka://flink/user/taskmanager#-157676733.
20:59:00,908 INFO org.apache.flink.runtime.taskmanager.TaskManager
- TaskManager data connection information: hostname-of-master
(dataPort=49030)
20:59:00,909 INFO org.apache.flink.runtime.taskmanager.TaskManager
- TaskManager has 1 task slot(s).
20:59:00,910 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Memory usage stats: [HEAP: 376/491/491 MB, NON HEAP: 24/49/304 MB
(used/committed/max)]
20:59:00,917 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Trying to register at JobManager
akka.tcp://flink@master-IP:6123/user/jobmanager
(attempt 1, timeout: 500 milliseconds)
20:59:01,443 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Trying to register at JobManager
akka.tcp://flink@master-IP:6123/user/jobmanager
(attempt 2, timeout: 1000 milliseconds)
20:59:02,873 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Trying to register at JobManager
akka.tcp://flink@master-IP:6123/user/jobmanager
(attempt 3, timeout: 2000 milliseconds)
20:59:04,893 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Trying to register at JobManager
akka.tcp://flink@master-IP:6123/user/jobmanager
(attempt 4, timeout: 4000 milliseconds)
20:59:08,914 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Trying to register at JobManager
akka.tcp://flink@master-IP:6123/user/jobmanager
(attempt 5, timeout: 8000 milliseconds)
Kind Regards,
Ravinder Kaur
On Wed, Feb 3, 2016 at 8:12 PM, Stephan Ewen <[email protected]> wrote:
> This looks like the reason:
>
> java.net.UnknownHostException: Cannot resolve the JobManager hostname
> 'hostname-of-master' specified in the configuration
>
> On Wed, Feb 3, 2016 at 7:29 PM, Ravinder Kaur <[email protected]> wrote:
>
>> Hello,
>>
>> The log file of the Taskmanager now shows the following
>>
>> 18:27:10,082 WARN org.apache.hadoop.util.NativeCodeLoader
>> - Unable to load native-hadoop library for your platform... using
>> builtin-java classes where applicable
>> 18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> -
>> --------------------------------------------------------------------------------
>> 18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - Starting TaskManager (Version: 0.10.1, Rev:2e9b231,
>> Date:22.11.2015 @ 12:41:12 CET)
>> 18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - Current user: flink
>> 18:27:10,245 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.91-b01
>> 18:27:10,245 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - Maximum heap size: 491 MiBytes
>> 18:27:10,245 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - JAVA_HOME: /usr/lib/jvm/java-1.7.0-openjdk-amd64
>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - Hadoop version: 2.7.0
>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - JVM Options:
>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - -Xms512M
>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - -Xmx512M
>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - -XX:MaxDirectMemorySize=8388607T
>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - -XX:MaxPermSize=256m
>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> -
>> -Dlog.file=/home/flink/flink-0.10.1/log/flink-flink-taskmanager-0-vm-10-155-208-137.cloud.mwn.de.log
>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> -
>> -Dlog4j.configuration=file:/home/flink/flink-0.10.1/conf/log4j.properties
>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> -
>> -Dlogback.configurationFile=file:/home/flink/flink-0.10.1/conf/logback.xml
>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - Program Arguments:
>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - --configDir
>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - /home/flink/flink-0.10.1/conf
>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - --streamingMode
>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - batch
>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - Classpath:
>> /home/flink/flink-0.10.1/lib/flink-dist_2.11-0.10.1.jar:/home/flink/flink-0.10.1/lib/flink-python_2.11-0.10.1.jar:/home/flink/flink-0.10.1/lib/log4j-1.2.17.jar:/home/flink/flink-0.10.1/lib/slf4j-log4j12-1.7.7.jar:/usr/lib/jvm/java-1.7.0-openjdk-amd64/lib/tools.jar::
>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> -
>> --------------------------------------------------------------------------------
>> 18:27:10,252 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - Maximum number of open file descriptors is 4096
>> 18:27:10,277 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - Loading configuration from /home/flink/flink-0.10.1/conf
>> 18:27:10,356 INFO org.apache.flink.runtime.taskmanager.TaskManager
>> - Security is not enabled. Starting non-authenticated TaskManager.
>> 18:27:10,365 ERROR org.apache.flink.runtime.taskmanager.TaskManager
>> - Failed to run TaskManager.
>> java.net.UnknownHostException: Cannot resolve the JobManager hostname
>> 'hostname-of-master' specified in the configuration
>> at
>> org.apache.flink.runtime.util.StandaloneUtils.createLeaderRetrievalService(StandaloneUtils.java:79)
>> at
>> org.apache.flink.runtime.util.StandaloneUtils.createLeaderRetrievalService(StandaloneUtils.java:48)
>> at
>> org.apache.flink.runtime.util.LeaderRetrievalUtils.createLeaderRetrievalService(LeaderRetrievalUtils.java:69)
>> at
>> org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndPort(TaskManager.scala:1351)
>> at
>> org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1328)
>> at
>> org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1240)
>> at
>> org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala)
>>
>> Kind Regards,
>> Ravinder Kaur
>>
>> On Wed, Feb 3, 2016 at 7:19 PM, Stephan Ewen <[email protected]> wrote:
>>
>>> What do the TaskManger logs say?
>>>
>>> On Wed, Feb 3, 2016 at 6:34 PM, Ravinder Kaur <[email protected]>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Thanks for the quick reply. I tried to set jobmanager.rpc.address in
>>>> flink-conf.yaml to the hostname of master node on both the nodes.
>>>>
>>>> Now it does not start the Taskmanager at the worker node at all. When I
>>>> start the cluster using ./bin/start-cluster.sh on master it shows the
>>>> normal output of starting the Jobmanager and Taskmanager but when I run jps
>>>> on the nodes the slave does not have the Taskmanager running.
>>>>
>>>> Running the WordCount example again fails showing the same error.
>>>> Stopping the cluster says no taskmanager to stop.
>>>>
>>>> Kind Regards,
>>>> Ravinder Kaur
>>>>
>>>> On Wed, Feb 3, 2016 at 5:47 PM, Stephan Ewen <[email protected]> wrote:
>>>>
>>>>> Looks like the network configuration is not correct.
>>>>>
>>>>> I would try setting the full host name (like "master.abc.xyz.com") as
>>>>> jobmanager.rpc.address.
>>>>>
>>>>> Greetings,
>>>>> Stephan
>>>>>
>>>>>
>>>>> On Wed, Feb 3, 2016 at 5:43 PM, Ravinder Kaur <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Hello Community,
>>>>>>
>>>>>> I'm a student and new to Apache Flink. I'm trying to learn and have
>>>>>> setup a 2- node standalone Flink(0.10.1) cluster (one master and one
>>>>>> worker). I'm facing the following issue.
>>>>>>
>>>>>> Cluster: consists of 2 vms (one master and one worker)
>>>>>>
>>>>>> The configurations are done as per
>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/cluster_setup.html
>>>>>>
>>>>>> When I start the cluster both the JobManager and the TaskManager are
>>>>>> started on the master and worker respectively.
>>>>>>
>>>>>> Command to start the cluster : bin/start-cluster.sh
>>>>>>
>>>>>> JPS shows all the processes running.
>>>>>>
>>>>>> Then I run the following command to run a WordCount example job:
>>>>>> ./bin/flink
>>>>>> run ./examples/WordCount.jar
>>>>>>
>>>>>> the result is attached with the mail.
>>>>>>
>>>>>> The error is
>>>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailabeException:
>>>>>> Not enough free slots available to run to run the job
>>>>>> ....................... Resources available to scheduler: Number of
>>>>>> instances=0, total number of slots= 0, available slots=0
>>>>>>
>>>>>> Therefore I suppose that the JobManager does not find the TaskManager
>>>>>> and checked the logs of the TaskManager which indeed shows that the
>>>>>> TaskManager is unable to register at the JobManager for quite a long
>>>>>> time. There
>>>>>> are org.apache.flink.runtime.net.ConnectionUtils: Failed to connect
>>>>>> from localhost: Connect timed out and
>>>>>> org.apache.flink.runtime.net.ConnectionUtils:
>>>>>> Failed to connect from address localhost: Network is Unreachable messages
>>>>>> in the log of the TaskManager. Later when it starts up after a number of
>>>>>> attempts and tries to register at the JobManager, which also fails after
>>>>>> a
>>>>>> lot of attempts showing the following message
>>>>>> org.apache.flink.runtime.taskmanager.Taskmanager:
>>>>>> Trying to register at JobManager
>>>>>> akka.tcp://flink@master:6123/user'/jobmanager
>>>>>> (attempt:92, timeout:30seconds) and
>>>>>> org.apache.flink.runtime.taskmanager.Taskmanager:
>>>>>> Tried to associate with unreachable remote host
>>>>>> [akka.tcp://flink@master:6123/user/jobmanager].
>>>>>> Address is now gated for 5000ms, all messages to this address will be
>>>>>> delivered to dead letters. Reason: Connection timed out: /master:6123
>>>>>>
>>>>>> I browsed the internet for these and found
>>>>>>
>>>>>> http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb
>>>>>> <http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb>
>>>>>> and https://issues.apache.org/jira/browse/FLINK-1119 these links
>>>>>> helpful. Stephan Ewen the guy who provided the solution in both the links
>>>>>> gives a good explanation that the TaskManagers take quite some time to
>>>>>> register at the JobManager and therefore I waited for as long as 20 mins
>>>>>> after starting the cluster to run the job. But even after waiting so
>>>>>> long I
>>>>>> get the same error.
>>>>>>
>>>>>> Another suggestion was to run the cluster in streaming mode. So I
>>>>>> tried it with the command : bin/start-cluster-streaming.sh and ran
>>>>>> the job but I get the same error. I have rechecked all the configurations
>>>>>> but I'm unable to find out the fault.
>>>>>>
>>>>>> I re-checked all the configurations but could not find anything
>>>>>> wrong. Also checked the port 6123 on master which is in LISTEN state and
>>>>>> tcp request from worker to master shows SYN_SENT state using netstat -na
>>>>>> and lsof -i commands.
>>>>>>
>>>>>> I opened the webpage on master http://localhost:8081 but it shows
>>>>>> nothing and localhost:8080 says connection refused.
>>>>>>
>>>>>> Kindly help me out as it is very important for me. Let me know if you
>>>>>> have any questions.
>>>>>>
>>>>>> Kind Regards,
>>>>>> Ravinder Kaur
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>