Which release are you using ? Flink 1.3.2 uses Curator 2.12.0 which solves some leader election issues.
Mind giving 1.3.2 a try ? On Fri, Sep 22, 2017 at 4:54 AM, Gyula Fóra <gyula.f...@gmail.com> wrote: > Hi all, > > We have observed that in case some nodes of the ZK cluster are restarted > (for a rolling restart) the Flink Streaming jobs fail (and restart). > > Log excerpt: > > 2017-09-22 12:54:41,426 INFO org.apache.zookeeper.ClientCnxn > - Unable to read additional data from server > sessionid 0x15cba6e1a239774, likely server has closed socket, closing > socket connection and attempting reconnect > 2017-09-22 12:54:41,527 INFO > org.apache.flink.shaded.org.apache.curator.framework. > state.ConnectionStateManager > - State change: SUSPENDED > 2017-09-22 12:54:41,528 WARN > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService > - Connection to ZooKeeper suspended. The contender > akka.tcp://fl...@splat.sto.midasplayer.com:42118/user/jobmanager no > longer participates in the leader election. > 2017-09-22 12:54:41,528 WARN > org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService > - Connection to ZooKeeper suspended. Can no longer retrieve the > leader from ZooKeeper. > 2017-09-22 12:54:41,528 WARN > org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService > - Connection to ZooKeeper suspended. Can no longer retrieve the > leader from ZooKeeper. > 2017-09-22 12:54:41,530 WARN > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - > ZooKeeper connection SUSPENDED. Changes to the submitted job graphs > are not monitored (temporarily). > 2017-09-22 12:54:41,530 INFO org.apache.flink.yarn.YarnJobManager > - JobManager > akka://flink/user/jobmanager#-317276879 was revoked leadership. > 2017-09-22 12:54:41,532 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - Job > event.game.log (2ad7bbcc476bbe3735954fc414ffcb97) switched from state > RUNNING to SUSPENDED. > java.lang.Exception: JobManager is no longer the leader. > > > Is this the expected behaviour? > > Thanks, > Gyula >