[ https://issues.apache.org/jira/browse/FLINK-29550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614517#comment-17614517 ]
roa commented on FLINK-29550: ----------------------------- and below is logged configurations (sorted) {code:java} blob.server.port, 6124 execution.checkpointing.externalized-checkpoint-retention, RETAIN_ON_CANCELLATION execution.shutdown-on-application-finish, false execution.submit-failed-job-on-application-error, true execution.target, kubernetes-application high-availability, org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.cluster-id, basic-checkpoint-ha-example high-availability.jobmanager.port, 6123 high-availability.storageDir, file:///flink-data/ha internal.cluster.execution-mode, NORMAL job-result-store.delete-on-commit, false job-result-store.storage-path, file:///flink-data/ha/job-result-store/basic-checkpoint-ha-example/1dbb3b0a-5051-4bf7-9ffa-6f4f73f42800 jobmanager.memory.process.size, 2048m jobmanager.rpc.port, 6123 kubernetes.cluster-id, basic-checkpoint-ha-example kubernetes.container.image, flink:1.15 kubernetes.internal.jobmanager.entrypoint.class, org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint kubernetes.jobmanager.annotations, flinkdeployment.flink.apache.org/generation:7 kubernetes.jobmanager.cpu, 1.0 kubernetes.jobmanager.replicas, 1 kubernetes.namespace, devops kubernetes.operator.metrics.reporter.slf4j.factory.class, org.apache.flink.metrics.slf4j.Slf4jReporterFactory kubernetes.operator.metrics.reporter.slf4j.interval, 5 MINUTE kubernetes.operator.observer.progress-check.interval, 5 s kubernetes.operator.reconcile.interval, 15 s kubernetes.pod-template-file, /tmp/flink_op_generated_podTemplate_1562664763208101297.yaml kubernetes.rest-service.exposed.type, ClusterIP kubernetes.service-account, flink kubernetes.taskmanager.cpu, 1.0 parallelism.default, 2 pipeline.jars, local:///opt/flink/examples/streaming/StateMachineExample.jar queryable-state.proxy.ports, 6125 state.checkpoints.dir, file:///flink-data/checkpoints state.savepoints.dir, file:///flink-data/savepoints taskmanager.memory.process.size, 2048m taskmanager.numberOfTaskSlots, 2 taskmanager.rpc.port, 6122 web.cancel.enable, false{code} > example "basic-checkpoint-ha.yaml" not working > ---------------------------------------------- > > Key: FLINK-29550 > URL: https://issues.apache.org/jira/browse/FLINK-29550 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator > Affects Versions: 1.15.0 > Environment: * Kubernetes: EKS 1.22 > * Node: bottlerocket linux > * Manifest: > https://github.com/apache/flink-kubernetes-operator/blob/release-1.1/examples/basic-checkpoint-ha.yaml > Reporter: roa > Priority: Minor > > Hi, > I'm a flink beginner. and I'm considering using the kubernetes operator. > Before using it, we are testing these features and examples. > But, when I tried to apply basic-checkpoint-ha.yaml, I faced the below error. > {code:java} > 2022-10-08 17:04:08,261 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error > occurred in the cluster entrypoint. > java.util.concurrent.CompletionException: java.lang.IllegalStateException: > The base directory of the JobResultStore isn't accessible. No dirty > JobResults can be restored. > at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) > ~[?:?] > at java.util.concurrent.CompletableFuture.completeThrowable(Unknown > Source) [?:?] > at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) > [?:?] > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > [?:?] > at java.lang.Thread.run(Unknown Source) [?:?] > Caused by: java.lang.IllegalStateException: The base directory of the > JobResultStore isn't accessible. No dirty JobResults can be restored. > at org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.runtime.highavailability.FileSystemJobResultStore.getDirtyResultsInternal(FileSystemJobResultStore.java:181) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.runtime.highavailability.AbstractThreadsafeJobResultStore.withReadLock(AbstractThreadsafeJobResultStore.java:118) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.runtime.highavailability.AbstractThreadsafeJobResultStore.getDirtyResults(AbstractThreadsafeJobResultStore.java:100) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.getDirtyJobResults(SessionDispatcherLeaderProcess.java:190) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198) > ~[flink-dist-1.15.2.jar:1.15.2] > at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.getDirtyJobResultsIfRunning(SessionDispatcherLeaderProcess.java:184) > ~[flink-dist-1.15.2.jar:1.15.2] > ... 4 more > 2022-10-08 17:04:08,268 INFO org.apache.flink.runtime.blob.BlobServer > [] - Stopped BLOB server at 0.0.0.0:6124 > 2022-10-08 17:04:08,270 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Shutting > KubernetesApplicationClusterEntrypoint down with application status UNKNOWN. > Diagnostics Cluster entrypoint has been closed externally.. {code} > Could you let me know why that error occurs? -- This message was sent by Atlassian Jira (v8.20.10#820010)