[DISCUSS] What do we gain by supporting customized High-Availability services

Zili Chen Thu, 17 Oct 2019 04:59:48 -0700

Hi devs,

Recently the community excludes customize support on new restart strategies[1],
which reminds
me to think of which kind of customized support a framework like Flink
should provides.

The key idea is pluggable is not customizable.

We might handle a series of implementation of restart strategies as well as
high-availability
services in our codebase. But it has a fixed size, which is definitely
different from support
arbitrarily customized.

For a services like high-availability services, it underneath relies on
quite a lot of runtime
implementations. For example, JobGraphStore supports #releaseJobGraphStore
originally
due to ZK lock strategy; getJobManagerRetriever requires default address
because
StandaloneHighAvailabilityServices is non-ha and pre-configured.

This kind of interfaces, however, are possibly evolves with flink runtime
implementation such
as cluster management and coordination details. If we support customizing
it, it means
such internal a high-availability services becomes public interfaces. If we
keep it pluggable,
we can extend it reacting to runtime evolution, ensuring the
implementations stay in a fixed
set; while introducing new implementation(such as etcd[2] or MapDB[3]) if
they are good fit.

We don't have a customize support on ResourceManager although it is
pluggable that
others can implement a kubernetes resource manager[4]. Maybe this is a
better way
how we handle high-availability services. Pluggable, but not customizable.

Looking forward to your ideas. To be clear, I'm not trying to drop it now,
but I'm a bit
confusing about this topic and would like to turn to the wisdom in our
community.

Best,
tison.

[1]
https://lists.apache.org/x/thread.html/6ed95eb6a91168dba09901e158bc1b6f4b08f1e176db4641f79de765@%3Cdev.flink.apache.org%3E
[2] https://issues.apache.org/jira/browse/FLINK-11105
[3]
https://lists.apache.org/x/thread.html/eae4cbdf6dac466bc0247e3bc1a7a69fe7e1db7a512fcd607e9c081b@%3Cuser.flink.apache.org%3E
[4] https://github.com/tianchen92/flink/tree/k8s-master/flink-kubernete

[DISCUSS] What do we gain by supporting customized High-Availability services

Reply via email to