[ https://issues.apache.org/jira/browse/FLINK-19545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228907#comment-17228907 ]
Yang Wang edited comment on FLINK-19545 at 11/10/20, 3:56 AM: -------------------------------------------------------------- Actually, the above IT cases does not contain a E2E test, including Flink CLI job submission, kill the JobManager and check whether the Flink job could recover from latest checkpoint successfully. It is really a basic Kubernetes HA behavior test and could help us to keep it is always not broken. For the jepsen tests, I am not aware of this module before and will learn more about it. I think it makes sense to me to let it also work on Kubernetes. was (Author: fly_in_gis): Actually, the above IT cases does not contain a E2E test, including Flink CLI job submission, kill the JobManager and check whether the Flink job could recover from latest checkpoint successfully. It is really a basic Kubernetes HA behavior test and could help us to keep it is always not broken. For the jepsen tests, I am not aware of this project before and will learn more about it. I think it makes sense to me to let it also work on Kubernetes. > Add e2e test for native Kubernetes HA > ------------------------------------- > > Key: FLINK-19545 > URL: https://issues.apache.org/jira/browse/FLINK-19545 > Project: Flink > Issue Type: Sub-task > Components: Tests > Reporter: Yang Wang > Assignee: Yang Wang > Priority: Major > Fix For: 1.12.0 > > > We could use minikube for the E2E tests. Start a Flink session/application > cluster on K8s, kill one TaskManager pod or JobManager Pod and wait for the > job recovered from the latest checkpoint successfully. > {panel} > {panel} > |{{kubectl }}{{exec}} {{-it \{pod_name} -- }}{{/bin/sh}} {{-c }}{{"kill 1"}}| -- This message was sent by Atlassian Jira (v8.3.4#803005)