Thanks Aljoscha. I haven't checked that bit. Is there any configuration for TaskManagers to find ZK? Regards, Chirag
Sent from Yahoo Mail on Android On Wed, 14 Feb 2018 at 7:43 PM, Aljoscha Krettek<aljos...@apache.org> wrote: Do you see in the logs whether the TaskManager correctly connect to ZooKeeper as well? They need this in order to find the JobManager leader. Best,Aljoscha On 14. Feb 2018, at 06:12, Chirag Dewan <chirag.dewa...@yahoo.in> wrote: Hi, I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For JobManager HA, I have started a 3 node zookeeper service on the same swarm network and configured Flink's zookeeper quorum with zookeeper service instances. JobManager gets started with the LeaderElectionService and gets assigned a LeaderSessionID too, which I can see from the following log statements(attaching only related logs) : org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService.org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService.JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager was granted leadership with leader session ID Some(1f3b2ec6-77b6-4532-928f-ad8befd5202f). Trying to associate with JobManager leader akka.tcp://flink@jobmanager:6123/user/jobmanager Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#590681231] - leader session 1f3b2ec6-77b6-4532-928f-ad8befd5202f But TaskManagers are not able to register with the JobManager and gives the following error: Discard message LeaderSessionMessage(00000000-0000-0000-0000-000000000000,RegisterTaskManager(4fc8aceeae1e27e42b9f16df6c0cf5e3,4fc8aceeae1e27e42b9f16df6c0cf5e3 @ a118cdf39114 (dataPort=43017),cores=1, physMem=1044111360, heap=536870912, managed=324208384,1)) because the expected leader session ID 1f3b2ec6-77b6-4532-928f-ad8befd5202f did not equal the received leader session ID 00000000-0000-0000-0000-000000000000. Seems like the ResourceManager was not able to retrieve the LeaderSessionID and passed 00 ID. One interesting thing I observed was a ZK version log: The version of ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT will be used instead. Is this a ZK version problem? Should I be using ZK 3.4.6? My configuration: Flink Version : 1.4.0ZK version : 3.4.11 (I just pulled the latest image) Thanks in advance. Chirag