[ https://issues.apache.org/jira/browse/FLINK-21432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289224#comment-17289224 ]
Bhagi edited comment on FLINK-21432 at 2/23/21, 5:57 PM: --------------------------------------------------------- Hi Yang, I deployed standalone flink(12.0 version) cluster on kubernetes. increased the job manager pod replicas count=3 and 2 tasks manager pods then i have configured HA with kubernetes. But after HA configured UI is not properly working. +*Flink config yaml:*+ flink-conf.yaml: |+ jobmanager.rpc.address: flink-jobmanager taskmanager.numberOfTaskSlots: 2 blob.server.port: 6124 jobmanager.rpc.port: 6123 taskmanager.rpc.port: 6122 queryable-state.proxy.ports: 6125 jobmanager.memory.process.size: 1600m taskmanager.memory.process.size: 1728m parallelism.default: 2 web.log.path: /opt/flink/log/output.log taskmanager.log.path: /opt/flink/log/output.log state.backend: rocksdb state.checkpoints.dir: [file:///persistent/flinkData/checkpoints] state.backend.rocksdb.log.dir: /persistent/flinkData/rocksdb/logging/ state.savepoints.dir: [file:///persistent/flinkData/savepoints] state.backend.incremental: true state.checkpoints.num-retained: 1 web.upload.dir: /persistent/flinkData classloader.resolve-order: parent-first kubernetes.cluster-id: 111 high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: [file:///persistent/flinkData/checkpoints] security.ssl.rest.enabled: true security.ssl.rest.keystore: /persistent/flinkData/ssl_certs/flink-keystore.jks security.ssl.rest.truststore: /persistent/flinkData/ssl_certs/flink-truststore.jks security.ssl.rest.keystore-password: admin123 security.ssl.rest.key-password: admin123 security.ssl.rest.truststore-password: admin123 security.ssl.verify-hostname: false 2) Created configmaps for leader election and retrieval *KubernetesLeaderElectionService.jave* |1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16|{{private}} {{final}} {{LeaderElector leaderElector = kc.leaderElector()}} {{ }}{{.withConfig(}} {{ }}{{new}} {{LeaderElectionConfigBuilder()}} {{ }}{{.withName(leaderKey)}} {{ }}{{.withLeaseDuration(Duration.ofSeconds(15L))}} {{ }}{{.withLock(}}{{new}} {{ConfigMapLock(ns, leaseName, lockIdentity))}} {{ }}{{.withRenewDeadline(Duration.ofSeconds(10L))}} {{ }}{{.withRetryPeriod(Duration.ofSeconds(2L))}} {{ }}{{.withLeaderCallbacks(}}{{new}} {{LeaderCallbacks(}} {{ }}{{this}}{{::isLeader,}} {{ }}{{this}}{{::notLeader,}} {{ }}{{newLeader -> LOG.info(}}{{"New leader elected {}."}}{{, newLeader)}} {{ }}{{))}} {{ }}{{.build())}} {{ }}{{.build();}} {{ }}{{executor.execute(leaderElector::run);}}| h2. LeaderRetrieval We could create a watcher for the ConfigMap and get the leader address in the callback handler. {panel:title=KubernetesLeaderRetrievalService.jave} {panel} |1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22|{{kubeClient.configMaps().withName(cm).watch(}}{{new}} {{Watcher<ConfigMap>() {}} {{ }}{{@Override}} {{ }}{{public}} {{void}} {{eventReceived(Action action, ConfigMap resource) {}} {{ }}{{final}} {{String name = resource.getMetadata().getName();}} {{ }}{{switch}} {{(action) {}} {{ }}{{case}} {{ADDED:}} {{ }}{{case}} {{MODIFIED:}} {{ }}{{if}} {{(resource.getData() != }}{{null}}{{) {}} {{ }}{{// TODO a new leader has been elected}} {{ }}{{}}} {{ }}{{break}}{{;}} {{ }}{{case}} {{DELETED:}} {{ }}{{listener.handleError(}}{{new}} {{Exception(}}{{"Deleted while watching the configMap "}} {{+ name));}} {{ }}{{break}}{{;}} {{ }}{{case}} {{ERROR:}} {{ }}{{listener.handleError(}}{{new}} {{Exception(}}{{"Error while watching the configMap "}} {{+ name));}} {{ }}{{break}}{{;}} {{ }}{{default}}{{:}} {{ }}{{LOG.debug(}}{{"Ignore handling {} event for configMap {}"}}{{, action, resource.getMetadata().getName());}} {{ }}{{break}}{{;}} {{ }}{{}}} {{}}}| 3) Jobmanager logs, shows leader election !image-2021-02-23-23-27-38-247.png! 4) Flink UI is displaying this error information !image-2021-02-23-23-14-13-708.png! 5) i have doubt, why leader id is showing differently for restserver, resource manager & dispatcher.is this leader information is correct ?? New leader elected 65d77de7-59c8-4d6e-a321-85c102de0d51 for 111-restserver-leader. New leader elected fc1a1d6c-6d39-4fae-8a99-a37afcd9b1c0 for 111-resourcemanager-leader New leader elected 2ddfff94-c165-4e68-b80a-fd1ca59b39a7 for 111-dispatcher-leader was (Author: bhagi__r): Hi Yang, I deployed standalone flink(12.0 version) cluster on kubernetes. increased the job manager pod replicas count=3 and 2 tasks manager pods then i have configured HA with kubernetes. But after HA configured UI is not properly working. +*Flink config yaml:*+ flink-conf.yaml: |+ jobmanager.rpc.address: flink-jobmanager taskmanager.numberOfTaskSlots: 2 blob.server.port: 6124 jobmanager.rpc.port: 6123 taskmanager.rpc.port: 6122 queryable-state.proxy.ports: 6125 jobmanager.memory.process.size: 1600m taskmanager.memory.process.size: 1728m parallelism.default: 2 web.log.path: /opt/flink/log/output.log taskmanager.log.path: /opt/flink/log/output.log state.backend: rocksdb state.checkpoints.dir: file:///persistent/flinkData/checkpoints state.backend.rocksdb.log.dir: /persistent/flinkData/rocksdb/logging/ state.savepoints.dir: file:///persistent/flinkData/savepoints state.backend.incremental: true state.checkpoints.num-retained: 1 web.upload.dir: /persistent/flinkData classloader.resolve-order: parent-first kubernetes.cluster-id: 111 high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: file:///persistent/flinkData/checkpoints security.ssl.rest.enabled: true security.ssl.rest.keystore: /persistent/flinkData/ssl_certs/flink-keystore.jks security.ssl.rest.truststore: /persistent/flinkData/ssl_certs/flink-truststore.jks security.ssl.rest.keystore-password: admin123 security.ssl.rest.key-password: admin123 security.ssl.rest.truststore-password: admin123 security.ssl.verify-hostname: false 2) Created configmaps for leader election and retrieval *KubernetesLeaderElectionService.jave* |1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16|{{private}} {{final}} {{LeaderElector leaderElector = kc.leaderElector()}} {{ }}{{.withConfig(}} {{ }}{{new}} {{LeaderElectionConfigBuilder()}} {{ }}{{.withName(leaderKey)}} {{ }}{{.withLeaseDuration(Duration.ofSeconds(15L))}} {{ }}{{.withLock(}}{{new}} {{ConfigMapLock(ns, leaseName, lockIdentity))}} {{ }}{{.withRenewDeadline(Duration.ofSeconds(10L))}} {{ }}{{.withRetryPeriod(Duration.ofSeconds(2L))}} {{ }}{{.withLeaderCallbacks(}}{{new}} {{LeaderCallbacks(}} {{ }}{{this}}{{::isLeader,}} {{ }}{{this}}{{::notLeader,}} {{ }}{{newLeader -> LOG.info(}}{{"New leader elected {}."}}{{, newLeader)}} {{ }}{{))}} {{ }}{{.build())}} {{ }}{{.build();}} {{ }}{{executor.execute(leaderElector::run);}}| h2. LeaderRetrieval We could create a watcher for the ConfigMap and get the leader address in the callback handler. {panel:title=KubernetesLeaderRetrievalService.jave} {panel} |1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22|{{kubeClient.configMaps().withName(cm).watch(}}{{new}} {{Watcher<ConfigMap>() {}} {{ }}{{@Override}} {{ }}{{public}} {{void}} {{eventReceived(Action action, ConfigMap resource) {}} {{ }}{{final}} {{String name = resource.getMetadata().getName();}} {{ }}{{switch}} {{(action) {}} {{ }}{{case}} {{ADDED:}} {{ }}{{case}} {{MODIFIED:}} {{ }}{{if}} {{(resource.getData() != }}{{null}}{{) {}} {{ }}{{// TODO a new leader has been elected}} {{ }}{{}}} {{ }}{{break}}{{;}} {{ }}{{case}} {{DELETED:}} {{ }}{{listener.handleError(}}{{new}} {{Exception(}}{{"Deleted while watching the configMap "}} {{+ name));}} {{ }}{{break}}{{;}} {{ }}{{case}} {{ERROR:}} {{ }}{{listener.handleError(}}{{new}} {{Exception(}}{{"Error while watching the configMap "}} {{+ name));}} {{ }}{{break}}{{;}} {{ }}{{default}}{{:}} {{ }}{{LOG.debug(}}{{"Ignore handling {} event for configMap {}"}}{{, action, resource.getMetadata().getName());}} {{ }}{{break}}{{;}} {{ }}{{}}} {{}}}| 3) Jobmanager logs, shows leader election !image-2021-02-23-23-12-03-118.png! 4) Flink UI is displaying this error information !image-2021-02-23-23-14-13-708.png! 5) i have doubt, why leader id is showing differently for restserver, resource manager & dispatcher.is this leader information is correct ?? New leader elected 65d77de7-59c8-4d6e-a321-85c102de0d51 for 111-restserver-leader. New leader elected fc1a1d6c-6d39-4fae-8a99-a37afcd9b1c0 for 111-resourcemanager-leader New leader elected 2ddfff94-c165-4e68-b80a-fd1ca59b39a7 for 111-dispatcher-leader > Web UI -- Error - {"errors":["Service temporarily unavailable due to an > ongoing leader election. Please refresh."]} > ------------------------------------------------------------------------------------------------------------------- > > Key: FLINK-21432 > URL: https://issues.apache.org/jira/browse/FLINK-21432 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes > Affects Versions: 1.12.0 > Environment: debian > Reporter: Bhagi > Priority: Critical > Fix For: 1.12.0 > > Attachments: image-2021-02-22-10-39-06-180.png, > image-2021-02-23-23-12-03-118.png, image-2021-02-23-23-14-13-708.png, > image-2021-02-23-23-27-38-247.png > > > Web UI – throwing this Error > {"errors":["Service temporarily unavailable due to an ongoing leader > election. Please refresh."]} > Please find Job Manager logs. > > !image-2021-02-22-10-39-06-180.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)