I am trying to install a Flink HA cluster (Zookeeper mode) but the task
manager cannot find the job manager.
Here I give you the architecture;
- Machine 1 : Job Manager + Zookeeper
- Machine 2 : Task Manager
masters:
Machine1
slaves :
Machine2
flink-conf.yaml:
#jobmanager.rpc.address: localhost
jobmanager.rpc.port: 6123
blob.server.port: 50100-50200
taskmanager.data.port: 6121
high-availability: zookeeper
high-availability.zookeeper.quorum: Machine1:2181
high-availability.zookeeper.path.root: /flink-1.5.1
high-availability.cluster-id: /default_b
high-availability.storageDir: file:///shareflink/recovery
Here this is the log of Task Manager, it tries to connect to localhost
instead of Machine1:
2018-08-17 10:46:44,875 INFO
org.apache.flink.runtime.util.LeaderRetrievalUtils - Trying to
select the network interface and address to use by connecting to the leading
JobManager.
2018-08-17 10:46:44,876 INFO
org.apache.flink.runtime.util.LeaderRetrievalUtils - TaskManager
will try to connect for 10000 milliseconds before falling back to heuristics
2018-08-17 10:46:44,966 INFO
org.apache.flink.runtime.net.ConnectionUtils - Retrieved
new target address /127.0.0.1:37133.
2018-08-17 10:46:45,324 INFO
org.apache.flink.runtime.net.ConnectionUtils - Trying to
connect to address /127.0.0.1:37133
2018-08-17 10:46:45,325 INFO
org.apache.flink.runtime.net.ConnectionUtils - Failed to
connect from address 'Machine2/IP-Machine2': Connection refused
2018-08-17 10:46:45,325 INFO
org.apache.flink.runtime.net.ConnectionUtils - Failed to
connect from address '/127.0.0.1': Connection refused
2018-08-17 10:46:45,325 INFO
org.apache.flink.runtime.net.ConnectionUtils - Failed to
connect from address '/IP_Machine2': Connection refused
2018-08-17 10:46:45,325 INFO
org.apache.flink.runtime.net.ConnectionUtils - Failed to
connect from address '/127.0.0.1': Connection refused
2018-08-17 10:46:45,326 INFO
org.apache.flink.runtime.net.ConnectionUtils - Failed to
connect from address '/IP_Machine2': Connection refused
2018-08-17 10:46:45,326 INFO
org.apache.flink.runtime.net.ConnectionUtils - Failed to
connect from address '/127.0.0.1': Connection refused
2018-08-17 10:46:45,726 INFO
org.apache.flink.runtime.net.ConnectionUtils - Trying to
connect to address /127.0.0.1:37133
2018-08-17 10:46:45,727 INFO
org.apache.flink.runtime.net.ConnectionUtils - Failed to
connect from address 'Machine2/IP-Machine2
2018-08-17 10:47:22,022 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://[email protected]:36515] has
failed, address is now gated for [50] ms. Reason: [Association failed with
[akka.tcp://[email protected]:36515]] Caused by: [Connection refused:
/127.0.0.1:36515]
2018-08-17 10:47:22,022 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not
resolve ResourceManager address
akka.tcp://[email protected]:36515/user/resourcemanager, retrying in 10000 ms:
Could not connect to rpc endpoint under address
akka.tcp://[email protected]:36515/user/resourcemanager..
2018-08-17 10:47:32,037 WARN akka.remote.transport.netty.NettyTransport
- Remote connection to [null] failed with java.net.ConnectException:
Connection refused: /127.0.0.1:36515
PS. : **/etc/hosts** contains the **localhost, Machine1 and Machine2**
Can you please tell me how the Task Manager can connect to Job Manager ?
Regards
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/