Hi!

That is a very interesting proposition. In cases where you have a single
master only, you may bet away with quite good guarantees without ZK. In
fact, Flink does not store significant data in ZK at all, it only uses
locks and counters.

You can have a setup without ZK, provided you have the following:

  - All processes restart (a lost JobManager restarts eventually). Should
be given in Kubernetes.

  - A way for TaskManagers to discover the restarted JobManager. Should
work via Kubernetes as well (restarted containers retain the external
hostname)

  - A way to isolate different "leader sessions" against each other. Flink
currently uses ZooKeeper to also attach a "leader session ID" to leader
election, which is a fencing token to avoid that processes talk to each
other despite having different views on who is the leader, or whether the
leaser lost and re-gained leadership.

  - An atomic marker for what is the latest completed checkpoint.

  - A distributed atomic counter for the checkpoint ID. This is crucial to
ensure correctness of checkpoints in the presence of JobManager failures
and re-elections or split-brain situations.

I would assume that etcd can provide all of those services. The best way to
integrate it would probably be to add an implementation of Flink's
"HighAvailabilityServices" based on etcd.

Have a look at this class:
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperHaServices.java

If you want to contribute an extension of Flink using etcd, that would be
awesome.
This should have a FLIP though, and a plan on how to set up rigorous unit
testing of that implementation (because its correctness is very crucial to
Flink's HA resilience).

Best,
Stephan


On Mon, Aug 21, 2017 at 4:15 PM, Shannon Carey <sca...@expedia.com> wrote:

> Zookeeper should still be necessary even in that case, because it is where
> the JobManager stores information which needs to be recovered after the
> JobManager fails.
>
> We're eyeing https://github.com/coreos/zetcd as a way to run Zookeeper on
> top of Kubernetes' etcd cluster so that we don't have to rely on a separate
> Zookeeper cluster. However, we haven't tried it yet.
>
> -Shannon
>
> From: Hao Sun <ha...@zendesk.com>
> Date: Sunday, August 20, 2017 at 9:04 PM
> To: "user@flink.apache.org" <user@flink.apache.org>
> Subject: Flink HA with Kubernetes, without Zookeeper
>
> Hi, I am new to Flink and trying to bring up a Flink cluster on top of
> Kubernetes.
>
> For HA setup, with kubernetes, I think I just need one job manager and do
> not need Zookeeper? I will store all states to S3 buckets. So in case of
> failure, kubernetes can just bring up a new job manager without losing
> anything?
>
> I want to confirm my assumptions above make sense. Thanks
>

Reply via email to