[ https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234734#comment-17234734 ]
Guowei Ma commented on FLINK-20113: ----------------------------------- I test four scenarios # Kubernetes ## Session Cluster ### Deploy a session cluster to the k8s ### Access the JobManager Web ### Check the master have the KubernetesLeaderElector log ### Submit a StateMachineExample.jar job ### Verify that there are some complete checkpoint ### Kill the jobmaster pod ### Verify that job could recovery from previous checkpoint ## Perjob Cluster ### Build a perjob image registry.cn-beijing.aliyuncs.com/streamcompute/flink:k8s-ha-per-job ### Deploy Perjob cluster ### Access the JobManager Web ### Check the master have the KubernetesLeaderElector log ### Verify that there are some complete checkpoints ### Kill the pod ### Verify that job could recovery from previous checkpoint # Native Kubernetes ## Session Cluster ### Start a native k8s session ### Access the JobManager web ### Check the KubernetesLeaderElector log ### Submit a StateMachineExample.jar job ### Verify that there are some complete checkpoints. ### Kill the pod ### Verify that job could recovery from previous checkpoint ## Start Application ### Start a flink application ### Access the JobManager web ### Check the KubernetesLeaderElector log ### Kill the pod ### Verify that job could recovery from previous checkpoint --------------------------------------------------------------------------------- In general the new HA service is work. Most problems I found are about the log and documentation. > Test K8s High Availability Service > ---------------------------------- > > Key: FLINK-20113 > URL: https://issues.apache.org/jira/browse/FLINK-20113 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes > Affects Versions: 1.12.0 > Reporter: Robert Metzger > Assignee: Guowei Ma > Priority: Critical > Fix For: 1.12.0 > > > Added in https://issues.apache.org/jira/browse/FLINK-12884 > ---- > [General Information about the Flink 1.12 release > testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing] > When testing a feature, consider the following aspects: > - Is the documentation easy to understand > - Are the error messages, log messages, APIs etc. easy to understand > - Is the feature working as expected under normal conditions > - Is the feature working / failing as expected with invalid input, induced > errors etc. > If you find a problem during testing, please file a ticket > (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket. > During the testing, and once you are finished, please write a short summary > of all things you have tested. -- This message was sent by Atlassian Jira (v8.3.4#803005)