[ https://issues.apache.org/jira/browse/FLINK-20798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yang Wang updated FLINK-20798: ------------------------------ Description: 我这边 部署 flink 到 k8s 使用 PVC 作为 high avalibility storagedir , 我看jobmanager 的日志,选举成功了。但是 web 一直显示选举进行中。 When deploying standalone Flink on Kubernetes and configure the {{high-availability.storageDir}} to a mounted PVC directory, the Flink webui could not be visited normally. It shows that "Service temporarily unavailable due to an ongoing leader election. Please refresh". 下面是 jobmanager 的日志 The following is related logs from JobManager. ``` 2020-12-29T06:45:54.177850394Z 2020-12-29 14:45:54,177 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader election started 2020-12-29T06:45:54.177855303Z 2020-12-29 14:45:54,177 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to acquire leader lease 'ConfigMapLock: default - mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... 2020-12-29T06:45:54.178668055Z 2020-12-29 14:45:54,178 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened 2020-12-29T06:45:54.178895963Z 2020-12-29 14:45:54,178 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-resourcemanager-leader'}. 2020-12-29T06:45:54.179327491Z 2020-12-29 14:45:54,179 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@6d303498 2020-12-29T06:45:54.230081993Z 2020-12-29 14:45:54,229 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened 2020-12-29T06:45:54.230202329Z 2020-12-29 14:45:54,230 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-dispatcher-leader'}. 2020-12-29T06:45:54.230219281Z 2020-12-29 14:45:54,229 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened 2020-12-29T06:45:54.230353912Z 2020-12-29 14:45:54,230 INFO org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Starting DefaultLeaderElectionService with KubernetesLeaderElectionDriver\{configMapName='mta-flink-resourcemanager-leader'}. 2020-12-29T06:45:54.237004177Z 2020-12-29 14:45:54,236 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 2020-12-29T06:45:54.237024655Z 2020-12-29 14:45:54,236 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for mta-flink-restserver-leader. 2020-12-29T06:45:54.237027811Z 2020-12-29 14:45:54,236 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Successfully Acquired leader lease 'ConfigMapLock: default - mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)' 2020-12-29T06:45:54.237297376Z 2020-12-29 14:45:54,237 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant leadership to contender [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] with session ID 9587e13f-322f-4cd5-9fff-b4941462be0f. 2020-12-29T06:45:54.237353551Z 2020-12-29 14:45:54,237 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint [] - [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] was granted leadership with leaderSessionID=9587e13f-322f-4cd5-9fff-b4941462be0f 2020-12-29T06:45:54.237440354Z 2020-12-29 14:45:54,237 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Confirm leader session ID 9587e13f-322f-4cd5-9fff-b4941462be0f for leader [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/]. 2020-12-29T06:45:54.254555127Z 2020-12-29 14:45:54,254 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 2020-12-29T06:45:54.254588299Z 2020-12-29 14:45:54,254 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for mta-flink-resourcemanager-leader. 2020-12-29T06:45:54.254628053Z 2020-12-29 14:45:54,254 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Successfully Acquired leader lease 'ConfigMapLock: default - mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)' 2020-12-29T06:45:54.254871569Z 2020-12-29 14:45:54,254 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant leadership to contender LeaderContender: StandaloneResourceManager with session ID b1730dc6-0f94-49f4-b519-56917f3027b7. 2020-12-29T06:45:54.256608291Z 2020-12-29 14:45:54,256 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to renew leader lease 'ConfigMapLock: default - mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... 2020-12-29T06:45:54.259155793Z 2020-12-29 14:45:54,258 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 2020-12-29T06:45:54.259176091Z 2020-12-29 14:45:54,258 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for mta-flink-dispatcher-leader. 2020-12-29T06:45:54.25918096Z 2020-12-29 14:45:54,259 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Successfully Acquired leader lease 'ConfigMapLock: default - mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)' 2020-12-29T06:45:54.259362149Z 2020-12-29 14:45:54,259 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant leadership to contender LeaderContender: DefaultDispatcherRunner with session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1. 2020-12-29T06:45:54.260301799Z 2020-12-29 14:45:54,260 DEBUG org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - Create new DispatcherLeaderProcess with leader session id fbbaa883-69f6-43df-9ca0-c646bc1baad1. 2020-12-29T06:45:54.266724597Z 2020-12-29 14:45:54,266 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Start SessionDispatcherLeaderProcess. 2020-12-29T06:45:54.267718418Z 2020-12-29 14:45:54,267 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to renew leader lease 'ConfigMapLock: default - mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... 2020-12-29T06:45:54.26786349Z 2020-12-29 14:45:54,267 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Recover all persisted job graphs. 2020-12-29T06:45:54.267976912Z 2020-12-29 14:45:54,267 DEBUG org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieving all stored job ids from KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'}. 2020-12-29T06:45:54.277681598Z 2020-12-29 14:45:54,277 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - ResourceManager akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0 was granted leadership with fencing token b51956917f3027b7b1730dc60f9449f4 2020-12-29T06:45:54.280411279Z 2020-12-29 14:45:54,280 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - Starting the SlotManager. 2020-12-29T06:45:54.281367931Z 2020-12-29 14:45:54,281 DEBUG org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] - Successfully wrote leader information: Leader=[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/], session ID=9587e13f-322f-4cd5-9fff-b4941462be0f. 2020-12-29T06:45:54.281528772Z 2020-12-29 14:45:54,281 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to renew leader lease 'ConfigMapLock: default - mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... 2020-12-29T06:45:54.286191344Z 2020-12-29 14:45:54,286 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:45:54.286304807Z 2020-12-29 14:45:54,286 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:45:54.286438227Z 2020-12-29 14:45:54,286 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Confirm leader session ID b1730dc6-0f94-49f4-b519-56917f3027b7 for leader akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0. 2020-12-29T06:45:54.309361096Z 2020-12-29 14:45:54,309 DEBUG org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] - Successfully wrote leader information: Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0, session ID=b1730dc6-0f94-49f4-b519-56917f3027b7. 2020-12-29T06:45:54.320673232Z 2020-12-29 14:45:54,320 INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job ids [] from KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'} 2020-12-29T06:45:54.3206989Z 2020-12-29 14:45:54,320 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Successfully recovered 0 persisted job graphs. 2020-12-29T06:45:54.324829616Z 2020-12-29 14:45:54,324 DEBUG org.apache.flink.runtime.rpc.akka.SupervisorActor [] - Starting FencedAkkaRpcActor with name dispatcher_1. 2020-12-29T06:45:54.325343659Z 2020-12-29 14:45:54,325 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/rpc/dispatcher_1 . 2020-12-29T06:45:54.33778039Z 2020-12-29 14:45:54,337 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Confirm leader session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1 for leader akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1. 2020-12-29T06:45:54.36249763Z 2020-12-29 14:45:54,362 DEBUG org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] - Successfully wrote leader information: Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1, session ID=fbbaa883-69f6-43df-9ca0-c646bc1baad1. 2020-12-29T06:46:04.298366262Z 2020-12-29 14:46:04,297 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:46:04.298442695Z 2020-12-29 14:46:04,298 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:46:14.318174464Z 2020-12-29 14:46:14,317 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:46:14.318256849Z 2020-12-29 14:46:14,318 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:46:24.337694477Z 2020-12-29 14:46:24,337 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:46:24.337816516Z 2020-12-29 14:46:24,337 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:46:26.044624193Z 2020-12-29 14:46:26,044 DEBUG org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf [] - -Dorg.apache.flink.shaded.netty4.io.netty.buffer.checkAccessible: true ``` was: 我这边 部署 flink 到 k8s 使用 PVC 作为 high avalibility storagedir , 我看jobmanager 的日志,选举成功了。但是 web 一直显示选举进行中。 When deploying standalone Flink on Kubernetes and configure the {{high-availability.storageDir}} to a mounted PVC directory, the Flink webui could not be visited normally. It shows that "" 下面是 jobmanager 的日志 ``` 2020-12-29T06:45:54.177850394Z 2020-12-29 14:45:54,177 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader election started 2020-12-29T06:45:54.177855303Z 2020-12-29 14:45:54,177 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to acquire leader lease 'ConfigMapLock: default - mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... 2020-12-29T06:45:54.178668055Z 2020-12-29 14:45:54,178 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened 2020-12-29T06:45:54.178895963Z 2020-12-29 14:45:54,178 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-resourcemanager-leader'}. 2020-12-29T06:45:54.179327491Z 2020-12-29 14:45:54,179 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@6d303498 2020-12-29T06:45:54.230081993Z 2020-12-29 14:45:54,229 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened 2020-12-29T06:45:54.230202329Z 2020-12-29 14:45:54,230 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-dispatcher-leader'}. 2020-12-29T06:45:54.230219281Z 2020-12-29 14:45:54,229 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened 2020-12-29T06:45:54.230353912Z 2020-12-29 14:45:54,230 INFO org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Starting DefaultLeaderElectionService with KubernetesLeaderElectionDriver\{configMapName='mta-flink-resourcemanager-leader'}. 2020-12-29T06:45:54.237004177Z 2020-12-29 14:45:54,236 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 2020-12-29T06:45:54.237024655Z 2020-12-29 14:45:54,236 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for mta-flink-restserver-leader. 2020-12-29T06:45:54.237027811Z 2020-12-29 14:45:54,236 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Successfully Acquired leader lease 'ConfigMapLock: default - mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)' 2020-12-29T06:45:54.237297376Z 2020-12-29 14:45:54,237 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant leadership to contender [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] with session ID 9587e13f-322f-4cd5-9fff-b4941462be0f. 2020-12-29T06:45:54.237353551Z 2020-12-29 14:45:54,237 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint [] - [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] was granted leadership with leaderSessionID=9587e13f-322f-4cd5-9fff-b4941462be0f 2020-12-29T06:45:54.237440354Z 2020-12-29 14:45:54,237 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Confirm leader session ID 9587e13f-322f-4cd5-9fff-b4941462be0f for leader [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/]. 2020-12-29T06:45:54.254555127Z 2020-12-29 14:45:54,254 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 2020-12-29T06:45:54.254588299Z 2020-12-29 14:45:54,254 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for mta-flink-resourcemanager-leader. 2020-12-29T06:45:54.254628053Z 2020-12-29 14:45:54,254 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Successfully Acquired leader lease 'ConfigMapLock: default - mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)' 2020-12-29T06:45:54.254871569Z 2020-12-29 14:45:54,254 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant leadership to contender LeaderContender: StandaloneResourceManager with session ID b1730dc6-0f94-49f4-b519-56917f3027b7. 2020-12-29T06:45:54.256608291Z 2020-12-29 14:45:54,256 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to renew leader lease 'ConfigMapLock: default - mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... 2020-12-29T06:45:54.259155793Z 2020-12-29 14:45:54,258 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 2020-12-29T06:45:54.259176091Z 2020-12-29 14:45:54,258 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for mta-flink-dispatcher-leader. 2020-12-29T06:45:54.25918096Z 2020-12-29 14:45:54,259 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Successfully Acquired leader lease 'ConfigMapLock: default - mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)' 2020-12-29T06:45:54.259362149Z 2020-12-29 14:45:54,259 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant leadership to contender LeaderContender: DefaultDispatcherRunner with session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1. 2020-12-29T06:45:54.260301799Z 2020-12-29 14:45:54,260 DEBUG org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - Create new DispatcherLeaderProcess with leader session id fbbaa883-69f6-43df-9ca0-c646bc1baad1. 2020-12-29T06:45:54.266724597Z 2020-12-29 14:45:54,266 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Start SessionDispatcherLeaderProcess. 2020-12-29T06:45:54.267718418Z 2020-12-29 14:45:54,267 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to renew leader lease 'ConfigMapLock: default - mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... 2020-12-29T06:45:54.26786349Z 2020-12-29 14:45:54,267 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Recover all persisted job graphs. 2020-12-29T06:45:54.267976912Z 2020-12-29 14:45:54,267 DEBUG org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieving all stored job ids from KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'}. 2020-12-29T06:45:54.277681598Z 2020-12-29 14:45:54,277 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - ResourceManager akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0 was granted leadership with fencing token b51956917f3027b7b1730dc60f9449f4 2020-12-29T06:45:54.280411279Z 2020-12-29 14:45:54,280 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - Starting the SlotManager. 2020-12-29T06:45:54.281367931Z 2020-12-29 14:45:54,281 DEBUG org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] - Successfully wrote leader information: Leader=[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/], session ID=9587e13f-322f-4cd5-9fff-b4941462be0f. 2020-12-29T06:45:54.281528772Z 2020-12-29 14:45:54,281 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to renew leader lease 'ConfigMapLock: default - mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... 2020-12-29T06:45:54.286191344Z 2020-12-29 14:45:54,286 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:45:54.286304807Z 2020-12-29 14:45:54,286 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:45:54.286438227Z 2020-12-29 14:45:54,286 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Confirm leader session ID b1730dc6-0f94-49f4-b519-56917f3027b7 for leader akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0. 2020-12-29T06:45:54.309361096Z 2020-12-29 14:45:54,309 DEBUG org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] - Successfully wrote leader information: Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0, session ID=b1730dc6-0f94-49f4-b519-56917f3027b7. 2020-12-29T06:45:54.320673232Z 2020-12-29 14:45:54,320 INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job ids [] from KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'} 2020-12-29T06:45:54.3206989Z 2020-12-29 14:45:54,320 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Successfully recovered 0 persisted job graphs. 2020-12-29T06:45:54.324829616Z 2020-12-29 14:45:54,324 DEBUG org.apache.flink.runtime.rpc.akka.SupervisorActor [] - Starting FencedAkkaRpcActor with name dispatcher_1. 2020-12-29T06:45:54.325343659Z 2020-12-29 14:45:54,325 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/rpc/dispatcher_1 . 2020-12-29T06:45:54.33778039Z 2020-12-29 14:45:54,337 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Confirm leader session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1 for leader akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1. 2020-12-29T06:45:54.36249763Z 2020-12-29 14:45:54,362 DEBUG org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] - Successfully wrote leader information: Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1, session ID=fbbaa883-69f6-43df-9ca0-c646bc1baad1. 2020-12-29T06:46:04.298366262Z 2020-12-29 14:46:04,297 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:46:04.298442695Z 2020-12-29 14:46:04,298 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:46:14.318174464Z 2020-12-29 14:46:14,317 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:46:14.318256849Z 2020-12-29 14:46:14,318 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:46:24.337694477Z 2020-12-29 14:46:24,337 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:46:24.337816516Z 2020-12-29 14:46:24,337 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request. 2020-12-29T06:46:26.044624193Z 2020-12-29 14:46:26,044 DEBUG org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf [] - -Dorg.apache.flink.shaded.netty4.io.netty.buffer.checkAccessible: true ``` > Using PVC as high-availability.storageDir could not work > -------------------------------------------------------- > > Key: FLINK-20798 > URL: https://issues.apache.org/jira/browse/FLINK-20798 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes > Affects Versions: 1.12.0 > Environment: FLINK 1.12.0 > Reporter: hayden zhou > Priority: Major > Attachments: flink.log > > > 我这边 部署 flink 到 k8s 使用 PVC 作为 high avalibility storagedir , 我看jobmanager > 的日志,选举成功了。但是 web 一直显示选举进行中。 > When deploying standalone Flink on Kubernetes and configure the > {{high-availability.storageDir}} to a mounted PVC directory, the Flink webui > could not be visited normally. It shows that "Service temporarily unavailable > due to an ongoing leader election. Please refresh". > > 下面是 jobmanager 的日志 > The following is related logs from JobManager. > ``` > 2020-12-29T06:45:54.177850394Z 2020-12-29 14:45:54,177 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Leader election started > 2020-12-29T06:45:54.177855303Z 2020-12-29 14:45:54,177 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Attempting to acquire leader lease 'ConfigMapLock: default - > mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... > 2020-12-29T06:45:54.178668055Z 2020-12-29 14:45:54,178 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > 2020-12-29T06:45:54.178895963Z 2020-12-29 14:45:54,178 INFO > org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - > Starting DefaultLeaderRetrievalService with > KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-resourcemanager-leader'}. > 2020-12-29T06:45:54.179327491Z 2020-12-29 14:45:54,179 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@6d303498 > 2020-12-29T06:45:54.230081993Z 2020-12-29 14:45:54,229 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > 2020-12-29T06:45:54.230202329Z 2020-12-29 14:45:54,230 INFO > org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - > Starting DefaultLeaderRetrievalService with > KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-dispatcher-leader'}. > 2020-12-29T06:45:54.230219281Z 2020-12-29 14:45:54,229 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > 2020-12-29T06:45:54.230353912Z 2020-12-29 14:45:54,230 INFO > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Starting DefaultLeaderElectionService with > KubernetesLeaderElectionDriver\{configMapName='mta-flink-resourcemanager-leader'}. > 2020-12-29T06:45:54.237004177Z 2020-12-29 14:45:54,236 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 > 2020-12-29T06:45:54.237024655Z 2020-12-29 14:45:54,236 INFO > org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - > New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for > mta-flink-restserver-leader. > 2020-12-29T06:45:54.237027811Z 2020-12-29 14:45:54,236 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Successfully Acquired leader lease 'ConfigMapLock: default - > mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)' > 2020-12-29T06:45:54.237297376Z 2020-12-29 14:45:54,237 DEBUG > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Grant leadership to contender > [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] with > session ID 9587e13f-322f-4cd5-9fff-b4941462be0f. > 2020-12-29T06:45:54.237353551Z 2020-12-29 14:45:54,237 INFO > org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint [] - > [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] was > granted leadership with leaderSessionID=9587e13f-322f-4cd5-9fff-b4941462be0f > 2020-12-29T06:45:54.237440354Z 2020-12-29 14:45:54,237 DEBUG > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Confirm leader session ID 9587e13f-322f-4cd5-9fff-b4941462be0f for leader > [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/]. > 2020-12-29T06:45:54.254555127Z 2020-12-29 14:45:54,254 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 > 2020-12-29T06:45:54.254588299Z 2020-12-29 14:45:54,254 INFO > org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - > New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for > mta-flink-resourcemanager-leader. > 2020-12-29T06:45:54.254628053Z 2020-12-29 14:45:54,254 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Successfully Acquired leader lease 'ConfigMapLock: default - > mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)' > 2020-12-29T06:45:54.254871569Z 2020-12-29 14:45:54,254 DEBUG > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Grant leadership to contender LeaderContender: StandaloneResourceManager with > session ID b1730dc6-0f94-49f4-b519-56917f3027b7. > 2020-12-29T06:45:54.256608291Z 2020-12-29 14:45:54,256 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Attempting to renew leader lease 'ConfigMapLock: default - > mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... > 2020-12-29T06:45:54.259155793Z 2020-12-29 14:45:54,258 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 > 2020-12-29T06:45:54.259176091Z 2020-12-29 14:45:54,258 INFO > org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - > New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for > mta-flink-dispatcher-leader. > 2020-12-29T06:45:54.25918096Z 2020-12-29 14:45:54,259 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Successfully Acquired leader lease 'ConfigMapLock: default - > mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)' > 2020-12-29T06:45:54.259362149Z 2020-12-29 14:45:54,259 DEBUG > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Grant leadership to contender LeaderContender: DefaultDispatcherRunner with > session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1. > 2020-12-29T06:45:54.260301799Z 2020-12-29 14:45:54,260 DEBUG > org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - > Create new DispatcherLeaderProcess with leader session id > fbbaa883-69f6-43df-9ca0-c646bc1baad1. > 2020-12-29T06:45:54.266724597Z 2020-12-29 14:45:54,266 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Start SessionDispatcherLeaderProcess. > 2020-12-29T06:45:54.267718418Z 2020-12-29 14:45:54,267 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Attempting to renew leader lease 'ConfigMapLock: default - > mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... > 2020-12-29T06:45:54.26786349Z 2020-12-29 14:45:54,267 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Recover all persisted job graphs. > 2020-12-29T06:45:54.267976912Z 2020-12-29 14:45:54,267 DEBUG > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieving all > stored job ids from > KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'}. > 2020-12-29T06:45:54.277681598Z 2020-12-29 14:45:54,277 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > ResourceManager > akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0 was > granted leadership with fencing token b51956917f3027b7b1730dc60f9449f4 > 2020-12-29T06:45:54.280411279Z 2020-12-29 14:45:54,280 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Starting the SlotManager. > 2020-12-29T06:45:54.281367931Z 2020-12-29 14:45:54,281 DEBUG > org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver > [] - Successfully wrote leader information: > Leader=[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/], > session ID=9587e13f-322f-4cd5-9fff-b4941462be0f. > 2020-12-29T06:45:54.281528772Z 2020-12-29 14:45:54,281 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Attempting to renew leader lease 'ConfigMapLock: default - > mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... > 2020-12-29T06:45:54.286191344Z 2020-12-29 14:45:54,286 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:45:54.286304807Z 2020-12-29 14:45:54,286 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:45:54.286438227Z 2020-12-29 14:45:54,286 DEBUG > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Confirm leader session ID b1730dc6-0f94-49f4-b519-56917f3027b7 for leader > akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0. > 2020-12-29T06:45:54.309361096Z 2020-12-29 14:45:54,309 DEBUG > org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver > [] - Successfully wrote leader information: > Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0, > session ID=b1730dc6-0f94-49f4-b519-56917f3027b7. > 2020-12-29T06:45:54.320673232Z 2020-12-29 14:45:54,320 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job > ids [] from > KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'} > 2020-12-29T06:45:54.3206989Z 2020-12-29 14:45:54,320 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Successfully recovered 0 persisted job graphs. > 2020-12-29T06:45:54.324829616Z 2020-12-29 14:45:54,324 DEBUG > org.apache.flink.runtime.rpc.akka.SupervisorActor [] - Starting > FencedAkkaRpcActor with name dispatcher_1. > 2020-12-29T06:45:54.325343659Z 2020-12-29 14:45:54,325 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint > for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at > akka://flink/user/rpc/dispatcher_1 . > 2020-12-29T06:45:54.33778039Z 2020-12-29 14:45:54,337 DEBUG > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Confirm leader session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1 for leader > akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1. > 2020-12-29T06:45:54.36249763Z 2020-12-29 14:45:54,362 DEBUG > org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver > [] - Successfully wrote leader information: > Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1, > session ID=fbbaa883-69f6-43df-9ca0-c646bc1baad1. > 2020-12-29T06:46:04.298366262Z 2020-12-29 14:46:04,297 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:46:04.298442695Z 2020-12-29 14:46:04,298 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:46:14.318174464Z 2020-12-29 14:46:14,317 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:46:14.318256849Z 2020-12-29 14:46:14,318 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:46:24.337694477Z 2020-12-29 14:46:24,337 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:46:24.337816516Z 2020-12-29 14:46:24,337 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:46:26.044624193Z 2020-12-29 14:46:26,044 DEBUG > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf [] - > -Dorg.apache.flink.shaded.netty4.io.netty.buffer.checkAccessible: true > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)