
all data is stored in a distributed file system or object store (HDFS, S3,
Ceph, ...) and ZooKeeper only stores pointers to that data.

> Alright, just came across the first real-life problem with my Consul HA
> implementation.
> In Consul KV store there is a limit of 512kB per node and JobGraph of one
> of my apps exceeded it.
> In ZK there seems to be similar zNode Limit = 1MB
> How did you workaround it? Or maybe I serialize the JobGraph wrong?
>> I have very little experience with ZK and cannot explain the differences
>> between ZK and Consul by myself. However there are some comparisions
>> available:
>> * https://www.consul.io/intro/vs/zookeeper.html - done by Consul so may
>> be biased
>> * https://www.slideshare.net/IvanGlushkov/zookeeper-vs-consul-41882991
>> * https://jakon.me/2017/01/consul-deployment-orchestration/
>> Regarding testing - I did basic failover scenarios on my workstation with
>> 2 JobManagers, 2 TaskManagers and WindowJoin example app with checkpointing
>> and restarting turned on.
>> I was running the cluster no longer than for few hours.
>> For now I'd like to open Flink for alternative HA backends (
>> https://issues.apache.org/jira/browse/FLINK-8660)
>>> Hello,
>>> I don't know anything about Consul but the prospect of having other
>>> options beside Zookeeper is very interesting. It's rather surprising how
>>> little you had to modify existing classes to get this to work.
>>> It may take a bit until someone provides proper feedback as the
>>> community is currently prepping 2 releases (1.4.1 and 1.5), please don't be
>>> discouraged by this.
>>> I saw that your branch was based on the 1.4 version. In 1.5 we reworked
>>> the distributed architecture of Flink (in an initiative commonly referred
>>> to as FLIP-6) which may affect your work.
>>> 2 things to note from my side:
>>> It would also be helpful if you could explain the differences between ZK
>>> and Consul and how they stack up in terms of guarantees etc. .
>>> How did you test your solution so far? (Like how long was a cluster
>>> running, what failure scenarios)
>>> I'd like to get your opinion about this idea. I found related JIRA issue
>>>  FLINK-2366, but it seems to be dead. To attract your attention I copy
>>> my comment here.
>>> As an experiment I've implemented Flink HA on top of Consul. The
>>> implementation is working fine in the "lab" but is not battle tested yet.
>>> The source code is available at https://github.com/kbialek/
>>> flink/tree/feature/consul (flink-runtime package
>>> org.apache.flink.runtime.consul)
>>> Why?. Generally I'd like to keep as less moving parts as possible. We do
>>> not have Zookeeper running, but Consul is already in place. And in the end
>>> freedom of choice is a good thing.
>>> It would be great to see built-in Consul support in Flink someday, but
>>> if it is not expected then I suggest a little refactoring to open
>>> possibility to configure HighAvailabilityServicesFactory. As far as I
>>> can see this should be enough to inject any HA implementation.
