[ https://issues.apache.org/jira/browse/FLINK-20798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yang Wang updated FLINK-20798: ------------------------------ Summary: Using PVC as high-availability.storageDir could not work (was: Service temporarily unavailable due to an ongoing leader election. Please refresh.) > Using PVC as high-availability.storageDir could not work > -------------------------------------------------------- > > Key: FLINK-20798 > URL: https://issues.apache.org/jira/browse/FLINK-20798 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes > Affects Versions: 1.12.0 > Environment: FLINK 1.12.0 > Reporter: hayden zhou > Priority: Major > > 我这边 部署 flink 到 k8s 使用 PVC 作为 high avalibility storagedir , 我看jobmanager > 的日志,选举成功了。但是 web 一直显示选举进行中。 > > 下面是 jobmanager 的日志 > ``` > 2020-12-29T06:45:54.177850394Z 2020-12-29 14:45:54,177 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Leader election started > 2020-12-29T06:45:54.177855303Z 2020-12-29 14:45:54,177 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Attempting to acquire leader lease 'ConfigMapLock: default - > mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... > 2020-12-29T06:45:54.178668055Z 2020-12-29 14:45:54,178 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > 2020-12-29T06:45:54.178895963Z 2020-12-29 14:45:54,178 INFO > org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - > Starting DefaultLeaderRetrievalService with > KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-resourcemanager-leader'}. > 2020-12-29T06:45:54.179327491Z 2020-12-29 14:45:54,179 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@6d303498 > 2020-12-29T06:45:54.230081993Z 2020-12-29 14:45:54,229 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > 2020-12-29T06:45:54.230202329Z 2020-12-29 14:45:54,230 INFO > org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - > Starting DefaultLeaderRetrievalService with > KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-dispatcher-leader'}. > 2020-12-29T06:45:54.230219281Z 2020-12-29 14:45:54,229 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > 2020-12-29T06:45:54.230353912Z 2020-12-29 14:45:54,230 INFO > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Starting DefaultLeaderElectionService with > KubernetesLeaderElectionDriver\{configMapName='mta-flink-resourcemanager-leader'}. > 2020-12-29T06:45:54.237004177Z 2020-12-29 14:45:54,236 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 > 2020-12-29T06:45:54.237024655Z 2020-12-29 14:45:54,236 INFO > org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - > New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for > mta-flink-restserver-leader. > 2020-12-29T06:45:54.237027811Z 2020-12-29 14:45:54,236 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Successfully Acquired leader lease 'ConfigMapLock: default - > mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)' > 2020-12-29T06:45:54.237297376Z 2020-12-29 14:45:54,237 DEBUG > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Grant leadership to contender http://mta-flink-jobmanager:8081 with session > ID 9587e13f-322f-4cd5-9fff-b4941462be0f. > 2020-12-29T06:45:54.237353551Z 2020-12-29 14:45:54,237 INFO > org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint [] - > http://mta-flink-jobmanager:8081 was granted leadership with > leaderSessionID=9587e13f-322f-4cd5-9fff-b4941462be0f > 2020-12-29T06:45:54.237440354Z 2020-12-29 14:45:54,237 DEBUG > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Confirm leader session ID 9587e13f-322f-4cd5-9fff-b4941462be0f for leader > http://mta-flink-jobmanager:8081. > 2020-12-29T06:45:54.254555127Z 2020-12-29 14:45:54,254 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 > 2020-12-29T06:45:54.254588299Z 2020-12-29 14:45:54,254 INFO > org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - > New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for > mta-flink-resourcemanager-leader. > 2020-12-29T06:45:54.254628053Z 2020-12-29 14:45:54,254 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Successfully Acquired leader lease 'ConfigMapLock: default - > mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)' > 2020-12-29T06:45:54.254871569Z 2020-12-29 14:45:54,254 DEBUG > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Grant leadership to contender LeaderContender: StandaloneResourceManager with > session ID b1730dc6-0f94-49f4-b519-56917f3027b7. > 2020-12-29T06:45:54.256608291Z 2020-12-29 14:45:54,256 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Attempting to renew leader lease 'ConfigMapLock: default - > mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... > 2020-12-29T06:45:54.259155793Z 2020-12-29 14:45:54,258 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 > 2020-12-29T06:45:54.259176091Z 2020-12-29 14:45:54,258 INFO > org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - > New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for > mta-flink-dispatcher-leader. > 2020-12-29T06:45:54.25918096Z 2020-12-29 14:45:54,259 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Successfully Acquired leader lease 'ConfigMapLock: default - > mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)' > 2020-12-29T06:45:54.259362149Z 2020-12-29 14:45:54,259 DEBUG > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Grant leadership to contender LeaderContender: DefaultDispatcherRunner with > session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1. > 2020-12-29T06:45:54.260301799Z 2020-12-29 14:45:54,260 DEBUG > org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - > Create new DispatcherLeaderProcess with leader session id > fbbaa883-69f6-43df-9ca0-c646bc1baad1. > 2020-12-29T06:45:54.266724597Z 2020-12-29 14:45:54,266 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Start SessionDispatcherLeaderProcess. > 2020-12-29T06:45:54.267718418Z 2020-12-29 14:45:54,267 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Attempting to renew leader lease 'ConfigMapLock: default - > mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... > 2020-12-29T06:45:54.26786349Z 2020-12-29 14:45:54,267 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Recover all persisted job graphs. > 2020-12-29T06:45:54.267976912Z 2020-12-29 14:45:54,267 DEBUG > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieving all > stored job ids from > KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'}. > 2020-12-29T06:45:54.277681598Z 2020-12-29 14:45:54,277 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > ResourceManager > akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0 was > granted leadership with fencing token b51956917f3027b7b1730dc60f9449f4 > 2020-12-29T06:45:54.280411279Z 2020-12-29 14:45:54,280 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Starting the SlotManager. > 2020-12-29T06:45:54.281367931Z 2020-12-29 14:45:54,281 DEBUG > org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver > [] - Successfully wrote leader information: > Leader=http://mta-flink-jobmanager:8081, session > ID=9587e13f-322f-4cd5-9fff-b4941462be0f. > 2020-12-29T06:45:54.281528772Z 2020-12-29 14:45:54,281 DEBUG > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - > Attempting to renew leader lease 'ConfigMapLock: default - > mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'... > 2020-12-29T06:45:54.286191344Z 2020-12-29 14:45:54,286 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:45:54.286304807Z 2020-12-29 14:45:54,286 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:45:54.286438227Z 2020-12-29 14:45:54,286 DEBUG > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Confirm leader session ID b1730dc6-0f94-49f4-b519-56917f3027b7 for leader > akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0. > 2020-12-29T06:45:54.309361096Z 2020-12-29 14:45:54,309 DEBUG > org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver > [] - Successfully wrote leader information: > Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0, > session ID=b1730dc6-0f94-49f4-b519-56917f3027b7. > 2020-12-29T06:45:54.320673232Z 2020-12-29 14:45:54,320 INFO > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job > ids [] from > KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'} > 2020-12-29T06:45:54.3206989Z 2020-12-29 14:45:54,320 INFO > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] > - Successfully recovered 0 persisted job graphs. > 2020-12-29T06:45:54.324829616Z 2020-12-29 14:45:54,324 DEBUG > org.apache.flink.runtime.rpc.akka.SupervisorActor [] - Starting > FencedAkkaRpcActor with name dispatcher_1. > 2020-12-29T06:45:54.325343659Z 2020-12-29 14:45:54,325 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint > for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at > akka://flink/user/rpc/dispatcher_1 . > 2020-12-29T06:45:54.33778039Z 2020-12-29 14:45:54,337 DEBUG > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - > Confirm leader session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1 for leader > akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1. > 2020-12-29T06:45:54.36249763Z 2020-12-29 14:45:54,362 DEBUG > org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver > [] - Successfully wrote leader information: > Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1, > session ID=fbbaa883-69f6-43df-9ca0-c646bc1baad1. > 2020-12-29T06:46:04.298366262Z 2020-12-29 14:46:04,297 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:46:04.298442695Z 2020-12-29 14:46:04,298 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:46:14.318174464Z 2020-12-29 14:46:14,317 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:46:14.318256849Z 2020-12-29 14:46:14,318 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:46:24.337694477Z 2020-12-29 14:46:24,337 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:46:24.337816516Z 2020-12-29 14:46:24,337 DEBUG > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Trigger heartbeat request. > 2020-12-29T06:46:26.044624193Z 2020-12-29 14:46:26,044 DEBUG > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf [] - > -Dorg.apache.flink.shaded.netty4.io.netty.buffer.checkAccessible: true > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)