Thanks Shannon for the https://github.com/coreos/zetcd <https://github.com/coreos/zetcd> tips, I will check that out and share my results if we proceed on that path. Thanks Stephan for the details, this is very useful, I was about to ask what exactly is stored into zookeeper, haha.
On Mon, Aug 21, 2017 at 9:31 AM Stephan Ewen <se...@apache.org> wrote: > Hi! > > That is a very interesting proposition. In cases where you have a single > master only, you may bet away with quite good guarantees without ZK. In > fact, Flink does not store significant data in ZK at all, it only uses > locks and counters. > > You can have a setup without ZK, provided you have the following: > > - All processes restart (a lost JobManager restarts eventually). Should > be given in Kubernetes. > > - A way for TaskManagers to discover the restarted JobManager. Should > work via Kubernetes as well (restarted containers retain the external > hostname) > > - A way to isolate different "leader sessions" against each other. Flink > currently uses ZooKeeper to also attach a "leader session ID" to leader > election, which is a fencing token to avoid that processes talk to each > other despite having different views on who is the leader, or whether the > leaser lost and re-gained leadership. > > - An atomic marker for what is the latest completed checkpoint. > > - A distributed atomic counter for the checkpoint ID. This is crucial to > ensure correctness of checkpoints in the presence of JobManager failures > and re-elections or split-brain situations. > > I would assume that etcd can provide all of those services. The best way > to integrate it would probably be to add an implementation of Flink's > "HighAvailabilityServices" based on etcd. > > Have a look at this class: > https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperHaServices.java > <https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperHaServices.java> > > If you want to contribute an extension of Flink using etcd, that would be > awesome. > This should have a FLIP though, and a plan on how to set up rigorous unit > testing of that implementation (because its correctness is very crucial to > Flink's HA resilience). > > Best, > Stephan > > > On Mon, Aug 21, 2017 at 4:15 PM, Shannon Carey <sca...@expedia.com> wrote: > >> Zookeeper should still be necessary even in that case, because it is >> where the JobManager stores information which needs to be recovered after >> the JobManager fails. >> >> We're eyeing https://github.com/coreos/zetcd >> <https://github.com/coreos/zetcd> as a way to run >> Zookeeper on top of Kubernetes' etcd cluster so that we don't have to rely >> on a separate Zookeeper cluster. However, we haven't tried it yet. >> >> -Shannon >> >> From: Hao Sun <ha...@zendesk.com> >> Date: Sunday, August 20, 2017 at 9:04 PM >> To: "user@flink.apache.org" <user@flink.apache.org> >> Subject: Flink HA with Kubernetes, without Zookeeper >> >> Hi, I am new to Flink and trying to bring up a Flink cluster on top of >> Kubernetes. >> >> For HA setup, with kubernetes, I think I just need one job manager and do >> not need Zookeeper? I will store all states to S3 buckets. So in case of >> failure, kubernetes can just bring up a new job manager without losing >> anything? >> >> I want to confirm my assumptions above make sense. Thanks >> > >