flink-kubernetes-operator部署问题

圣万 Mon, 09 Jan 2023 01:16:36 -0800

您好：
我最近在尝试使用flink-kubernetes-operator来部署flink，在官方Github项目中发现了一些example，我在部署其中一个样例时发生了错误，还请您帮忙解答下，感谢！
项目地址：flink-kubernetes-operator/examples at main ・ 
apache/flink-kubernetes-operator 
(github.com)<https://github.com/apache/flink-kubernetes-operator/tree/main/examples>
所使用样例：basic-checkpoint-ha.yaml<https://github.com/apache/flink-kubernetes-operator/blob/main/examples/basic-checkpoint-ha.yaml>
内容如下：
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: basic-checkpoint-ha-example
spec:
  image: flink:1.15
  flinkVersion: v1_15
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    state.savepoints.dir: file:///flink-data/savepoints
    state.checkpoints.dir: file:///flink-data/checkpoints
    high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: file:///flink-data/ha
  serviceAccount: flink
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  podTemplate:
    spec:
      containers:
        - name: flink-main-container
          volumeMounts:
          - mountPath: /flink-data
            name: flink-volume
      volumes:
      - name: flink-volume
        hostPath:
          # directory location on host
          path: /tmp/flink
          # this field is optional
          type: Directory
  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: savepoint
    state: running
    savepointTriggerNonce: 0



报错内容如下：
2023-01-05 18:51:12,176 INFO  
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - 
Stopping SessionDispatcherLeaderProcess.
2023-01-05 18:51:12,185 INFO  
org.apache.flink.runtime.jobmanager.DefaultJobGraphStore     [] - Stopping 
DefaultJobGraphStore.
2023-01-05 18:51:12,191 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error 
occurred in the cluster entrypoint.
java.util.concurrent.CompletionException: java.lang.IllegalStateException: The 
base directory of the JobResultStore isn't accessible. No dirty JobResults can 
be restored.
     at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
 ~[?:1.8.0_352]
     at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
 [?:1.8.0_352]
     at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
 [?:1.8.0_352]
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_352]
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_352]
     at java.lang.Thread.run(Thread.java:750) [?:1.8.0_352]
Caused by: java.lang.IllegalStateException: The base directory of the 
JobResultStore isn't accessible. No dirty JobResults can be restored.
     at org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
~[flink-dist-1.16.0.jar:1.16.0]
     at 
org.apache.flink.runtime.highavailability.FileSystemJobResultStore.getDirtyResultsInternal(FileSystemJobResultStore.java:182)
 ~[flink-dist-1.16.0.jar:1.16.0]
     at 
org.apache.flink.runtime.highavailability.AbstractThreadsafeJobResultStore.withReadLock(AbstractThreadsafeJobResultStore.java:118)
 ~[flink-dist-1.16.0.jar:1.16.0]
     at 
org.apache.flink.runtime.highavailability.AbstractThreadsafeJobResultStore.getDirtyResults(AbstractThreadsafeJobResultStore.java:100)
 ~[flink-dist-1.16.0.jar:1.16.0]
     at 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.getDirtyJobResults(SessionDispatcherLeaderProcess.java:194)
 ~[flink-dist-1.16.0.jar:1.16.0]
     at 
org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198)
 ~[flink-dist-1.16.0.jar:1.16.0]
     at 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.getDirtyJobResultsIfRunning(SessionDispatcherLeaderProcess.java:188)
 ~[flink-dist-1.16.0.jar:1.16.0]
     at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
 ~[?:1.8.0_352]
     ... 3 more
2023-01-05 18:51:12,211 INFO  org.apache.flink.runtime.blob.BlobServer          
           [] - Stopped BLOB server at 0.0.0.0:6124
2023-01-05 18:51:12,574 INFO  
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
Starting the resource manager.
2023-01-05 18:51:13,776 INFO  
org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Recovered 0 
pods from previous attempts, current attempt id is 1.
2023-01-05 18:51:13,777 INFO  
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
Recovered 0 workers from previous attempt.
2023-01-05 18:51:13,898 WARN  akka.actor.CoordinatedShutdown                    
           [] - Could not addJvmShutdownHook, due to: Shutdown in progress
2023-01-05 18:51:13,898 WARN  akka.actor.CoordinatedShutdown                    
           [] - Could not addJvmShutdownHook, due to: Shutdown in progress
2023-01-05 18:51:13,999 INFO  
akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down 
remote daemon.
2023-01-05 18:51:14,000 INFO  
akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down 
remote daemon.
2023-01-05 18:51:14,075 INFO  
akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon 
shut down; proceeding with flushing remote transports.
2023-01-05 18:51:14,076 INFO  
akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon 
shut down; proceeding with flushing remote transports.
2023-01-05 18:51:14,105 INFO  
akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remoting shut 
down.

flink-kubernetes-operator部署问题

回复