Do you see in the logs whether the TaskManager correctly connect to ZooKeeper as well? They need this in order to find the JobManager leader.
Best, Aljoscha > On 14. Feb 2018, at 06:12, Chirag Dewan <chirag.dewa...@yahoo.in> wrote: > > Hi, > > I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For > JobManager HA, I have started a 3 node zookeeper service on the same swarm > network and configured Flink's zookeeper quorum with zookeeper service > instances. > > JobManager gets started with the LeaderElectionService and gets assigned a > LeaderSessionID too, which I can see from the following log > statements(attaching only related logs) : > > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - > Starting ZooKeeperLeaderElectionService > org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - > Starting ZooKeeperLeaderRetrievalService. > org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - > Starting ZooKeeperLeaderRetrievalService. > JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager was granted > leadership with leader session ID Some(1f3b2ec6-77b6-4532-928f-ad8befd5202f). > Trying to associate with JobManager leader > akka.tcp://flink@jobmanager:6123/user/jobmanager > Resource Manager associating with leading JobManager > Actor[akka://flink/user/jobmanager#590681231] - leader session > 1f3b2ec6-77b6-4532-928f-ad8befd5202f > > > But TaskManagers are not able to register with the JobManager and gives the > following error: > > Discard message > LeaderSessionMessage(00000000-0000-0000-0000-000000000000,RegisterTaskManager(4fc8aceeae1e27e42b9f16df6c0cf5e3,4fc8aceeae1e27e42b9f16df6c0cf5e3 > @ a118cdf39114 (dataPort=43017),cores=1, physMem=1044111360, heap=536870912, > managed=324208384,1)) because the expected leader session ID > 1f3b2ec6-77b6-4532-928f-ad8befd5202f did not equal the received leader > session ID 00000000-0000-0000-0000-000000000000. > > Seems like the ResourceManager was not able to retrieve the LeaderSessionID > and passed 00 ID. > > One interesting thing I observed was a ZK version log: > > The version of ZooKeeper being used doesn't support Container nodes. > CreateMode.PERSISTENT will be used instead. > > Is this a ZK version problem? Should I be using ZK 3.4.6? > > My configuration: > > Flink Version : 1.4.0 > ZK version : 3.4.11 (I just pulled the latest image) > > Thanks in advance. > > Chirag >