Hi Andrew,
if the ZooKeeper cluster fails and Flink is not able to connect to a
functioning quorum again, then it will basically stop working because the
JobManagers are no longer able to elect a leader among them. The lost
leadership of the JobManager can be seen in the logs (=> expected leader
s
I would think that network problems between Flink and Zookeeper in HA mode
could indeed lead to problems. Maybe Till (in CC) has a better idea of what is
going on there).
> Am 19.01.2017 um 14:55 schrieb Andrew Ge Wu :
>
> Hi Stefan
>
> Yes we are running in HA mode with dedicated zookeeper cl
Hi Stefan
Yes we are running in HA mode with dedicated zookeeper cluster. As far as I can
see it looks like a networking issue with zookeeper cluster.
2 out of 5 zookeeper reported something around the same time:
server1
2017-01-19 11:52:13,044 [myid:1] - WARN
[QuorumPeer[myid=1]/0:0:0:0:0:0:0
Hi,
I think depending on your configuration of Flink (are you using high
availability mode?) and the type of ZK glitches we are talking about, it can
very well be that some of Flinkās meta data in ZK got corrupted and the system
can not longer operate. But for a deeper analysis, we would need m