Which release are you using ?

Flink 1.3.2 uses Curator 2.12.0 which solves some leader election issues.

Mind giving 1.3.2 a try ?

On Fri, Sep 22, 2017 at 4:54 AM, Gyula Fóra <gyula.f...@gmail.com> wrote:

> Hi all,
>
> We have observed that in case some nodes of the ZK cluster are restarted
> (for a rolling restart) the Flink Streaming jobs fail (and restart).
>
> Log excerpt:
>
> 2017-09-22 12:54:41,426 INFO  org.apache.zookeeper.ClientCnxn
>                      - Unable to read additional data from server
> sessionid 0x15cba6e1a239774, likely server has closed socket, closing
> socket connection and attempting reconnect
> 2017-09-22 12:54:41,527 INFO
> org.apache.flink.shaded.org.apache.curator.framework.
> state.ConnectionStateManager
>  - State change: SUSPENDED
> 2017-09-22 12:54:41,528 WARN
> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService
>  - Connection to ZooKeeper suspended. The contender
> akka.tcp://fl...@splat.sto.midasplayer.com:42118/user/jobmanager no
> longer participates in the leader election.
> 2017-09-22 12:54:41,528 WARN
> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService
>  - Connection to ZooKeeper suspended. Can no longer retrieve the
> leader from ZooKeeper.
> 2017-09-22 12:54:41,528 WARN
> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService
>  - Connection to ZooKeeper suspended. Can no longer retrieve the
> leader from ZooKeeper.
> 2017-09-22 12:54:41,530 WARN
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
> ZooKeeper connection SUSPENDED. Changes to the submitted job graphs
> are not monitored (temporarily).
> 2017-09-22 12:54:41,530 INFO  org.apache.flink.yarn.YarnJobManager
>                      - JobManager
> akka://flink/user/jobmanager#-317276879 was revoked leadership.
> 2017-09-22 12:54:41,532 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job
> event.game.log (2ad7bbcc476bbe3735954fc414ffcb97) switched from state
> RUNNING to SUSPENDED.
> java.lang.Exception: JobManager is no longer the leader.
>
>
> Is this the expected behaviour?
>
> Thanks,
> Gyula
>

Reply via email to