Do you see in the logs whether the TaskManager correctly connect to ZooKeeper 
as well? They need this in order to find the JobManager leader.

Best,
Aljoscha

> On 14. Feb 2018, at 06:12, Chirag Dewan <chirag.dewa...@yahoo.in> wrote:
> 
> Hi,
> 
> I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For 
> JobManager HA, I have started a 3 node zookeeper service on the same swarm 
> network and configured Flink's zookeeper quorum with zookeeper service 
> instances. 
> 
> JobManager gets started with the LeaderElectionService and gets assigned a 
> LeaderSessionID too, which I can see from the following log 
> statements(attaching only related logs) :
> 
> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - 
> Starting ZooKeeperLeaderElectionService   
> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - 
> Starting ZooKeeperLeaderRetrievalService.
> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - 
> Starting ZooKeeperLeaderRetrievalService.
> JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager was granted 
> leadership with leader session ID Some(1f3b2ec6-77b6-4532-928f-ad8befd5202f).
>  Trying to associate with JobManager leader 
> akka.tcp://flink@jobmanager:6123/user/jobmanager
>  Resource Manager associating with leading JobManager 
> Actor[akka://flink/user/jobmanager#590681231] - leader session 
> 1f3b2ec6-77b6-4532-928f-ad8befd5202f
> 
> 
> But TaskManagers are not able to register with the JobManager and gives the 
> following error:
> 
> Discard message 
> LeaderSessionMessage(00000000-0000-0000-0000-000000000000,RegisterTaskManager(4fc8aceeae1e27e42b9f16df6c0cf5e3,4fc8aceeae1e27e42b9f16df6c0cf5e3
>  @ a118cdf39114 (dataPort=43017),cores=1, physMem=1044111360, heap=536870912, 
> managed=324208384,1)) because the expected leader session ID 
> 1f3b2ec6-77b6-4532-928f-ad8befd5202f did not equal the received leader 
> session ID 00000000-0000-0000-0000-000000000000.
> 
> Seems like the ResourceManager was not able to retrieve the LeaderSessionID 
> and passed 00 ID. 
> 
> One interesting thing I observed was a ZK version log:
> 
> The version of ZooKeeper being used doesn't support Container nodes. 
> CreateMode.PERSISTENT will be used instead.
> 
> Is this a ZK version problem? Should I be using ZK 3.4.6?
> 
> My configuration:
> 
> Flink Version : 1.4.0
> ZK version : 3.4.11 (I just pulled the latest image)
> 
> Thanks in advance. 
> 
> Chirag
> 

Reply via email to