Thanks for preparing this FLIP, @Yang. In general, I'm +1 for this new feature. Leveraging Kubernetes's buildtin ConfigMap for Flink's HA services should significantly reduce the maintenance overhead compared to deploying a ZK cluster. I think this is an attractive feature for users.
Concerning the proposed design, I have some questions. Might not be problems, just trying to understand. ## Architecture Why does the leader election need two ConfigMaps (`lock for contending leader`, and `leader RPC address`)? What happens if the two ConfigMaps are not updated consistently? E.g., a TM learns about a new JM becoming leader (lock for contending leader updated), but still gets the old leader's address when trying to read `leader RPC address`? ## HA storage > Lock and release It seems to me that the owner needs to explicitly release the lock so that other peers can write/remove the stored object. What if the previous owner failed to release the lock (e.g., dead before releasing)? Would there be any problem? ## HA storage > HA data clean up If the ConfigMap is destroyed on `kubectl delete deploy <ClusterID>`, how are the HA dada retained? Thank you~ Xintong Song On Tue, Sep 15, 2020 at 11:26 AM Yang Wang <danrtsey...@gmail.com> wrote: > Hi devs and users, > > I would like to start the discussion about FLIP-144[1], which will > introduce > a new native high availability service for Kubernetes. > > Currently, Flink has provided Zookeeper HA service and been widely used > in production environments. It could be integrated in standalone cluster, > Yarn, Kubernetes deployments. However, using the Zookeeper HA in K8s > will take additional cost since we need to manage a Zookeeper cluster. > In the meantime, K8s has provided some public API for leader election[2] > and configuration storage(i.e. ConfigMap[3]). We could leverage these > features and make running HA configured Flink cluster on K8s more > convenient. > > Both the standalone on K8s and native K8s could benefit from the new > introduced KubernetesHaService. > > [1]. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-144%3A+Native+Kubernetes+HA+for+Flink > [2]. > https://kubernetes.io/blog/2016/01/simple-leader-election-with-kubernetes/ > [3]. https://kubernetes.io/docs/concepts/configuration/configmap/ > > Looking forward to your feedback. > > Best, > Yang >