Hi! That is a very interesting proposition. In cases where you have a single master only, you may bet away with quite good guarantees without ZK. In fact, Flink does not store significant data in ZK at all, it only uses locks and counters.
You can have a setup without ZK, provided you have the following: - All processes restart (a lost JobManager restarts eventually). Should be given in Kubernetes. - A way for TaskManagers to discover the restarted JobManager. Should work via Kubernetes as well (restarted containers retain the external hostname) - A way to isolate different "leader sessions" against each other. Flink currently uses ZooKeeper to also attach a "leader session ID" to leader election, which is a fencing token to avoid that processes talk to each other despite having different views on who is the leader, or whether the leaser lost and re-gained leadership. - An atomic marker for what is the latest completed checkpoint. - A distributed atomic counter for the checkpoint ID. This is crucial to ensure correctness of checkpoints in the presence of JobManager failures and re-elections or split-brain situations. I would assume that etcd can provide all of those services. The best way to integrate it would probably be to add an implementation of Flink's "HighAvailabilityServices" based on etcd. Have a look at this class: https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperHaServices.java If you want to contribute an extension of Flink using etcd, that would be awesome. This should have a FLIP though, and a plan on how to set up rigorous unit testing of that implementation (because its correctness is very crucial to Flink's HA resilience). Best, Stephan On Mon, Aug 21, 2017 at 4:15 PM, Shannon Carey <sca...@expedia.com> wrote: > Zookeeper should still be necessary even in that case, because it is where > the JobManager stores information which needs to be recovered after the > JobManager fails. > > We're eyeing https://github.com/coreos/zetcd as a way to run Zookeeper on > top of Kubernetes' etcd cluster so that we don't have to rely on a separate > Zookeeper cluster. However, we haven't tried it yet. > > -Shannon > > From: Hao Sun <ha...@zendesk.com> > Date: Sunday, August 20, 2017 at 9:04 PM > To: "user@flink.apache.org" <user@flink.apache.org> > Subject: Flink HA with Kubernetes, without Zookeeper > > Hi, I am new to Flink and trying to bring up a Flink cluster on top of > Kubernetes. > > For HA setup, with kubernetes, I think I just need one job manager and do > not need Zookeeper? I will store all states to S3 buckets. So in case of > failure, kubernetes can just bring up a new job manager without losing > anything? > > I want to confirm my assumptions above make sense. Thanks >