I have very little experience with ZK and cannot explain the differences between ZK and Consul by myself. However there are some comparisions available: * https://www.consul.io/intro/vs/zookeeper.html - done by Consul so may be biased * https://www.slideshare.net/IvanGlushkov/zookeeper-vs-consul-41882991 * https://jakon.me/2017/01/consul-deployment-orchestration/
Regarding testing - I did basic failover scenarios on my workstation with 2 JobManagers, 2 TaskManagers and WindowJoin example app with checkpointing and restarting turned on. I was running the cluster no longer than for few hours. For now I'd like to open Flink for alternative HA backends ( https://issues.apache.org/jira/browse/FLINK-8660) On Wed, Feb 14, 2018 at 1:47 PM, Chesnay Schepler <ches...@apache.org> wrote: > Hello, > > I don't know anything about Consul but the prospect of having other > options beside Zookeeper is very interesting. It's rather surprising how > little you had to modify existing classes to get this to work. > > It may take a bit until someone provides proper feedback as the community > is currently prepping 2 releases (1.4.1 and 1.5), please don't be > discouraged by this. > > I saw that your branch was based on the 1.4 version. In 1.5 we reworked > the distributed architecture of Flink (in an initiative commonly referred > to as FLIP-6) which may affect your work. > > 2 things to note from my side: > It would also be helpful if you could explain the differences between ZK > and Consul and how they stack up in terms of guarantees etc. . > How did you test your solution so far? (Like how long was a cluster > running, what failure scenarios) > > > On 13.02.2018 21:38, Krzysztof Białek wrote: > > I'd like to get your opinion about this idea. I found related JIRA issue > FLINK-2366, > but it seems to be dead. To attract your attention I copy my comment here. > > As an experiment I've implemented Flink HA on top of Consul. The > implementation is working fine in the "lab" but is not battle tested yet. > The source code is available at https://github.com/kbialek/ > flink/tree/feature/consul (flink-runtime package > org.apache.flink.runtime.consul) > > Why?. Generally I'd like to keep as less moving parts as possible. We do > not have Zookeeper running, but Consul is already in place. And in the end > freedom of choice is a good thing. > > It would be great to see built-in Consul support in Flink someday, but if > it is not expected then I suggest a little refactoring to open possibility > to configure HighAvailabilityServicesFactory. As far as I can see this > should be enough to inject any HA implementation. > > Regards, > Krzysztof > > >