Hi amenreet, Maybe you can try to use hdfs or s3 for `high-availability.storageDir`, I found your current job is using a local file which is started with `file:///`.
Best, Shammon FY On Fri, Jul 7, 2023 at 4:20 PM amenreet sodhi <amenso...@gmail.com> wrote: > Hi All, > I am deploying Flink cluster on Kubernetes in HA mode. But i noticed, > whenever i deploy Flink cluster for first time on K8s cluster, it is not > able to populate the cluster configmap, and due to which JM fails with the > following exception: > > 2023-07-06 16:46:11,428 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error > occurred in the cluster entrypoint. > java.util.concurrent.CompletionException: java.lang.IllegalStateException: > The base directory of the JobResultStore isn't accessible. No dirty > JobResults can be restored. > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) > [?:?] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: java.lang.IllegalStateException: The base directory of the > JobResultStore isn't accessible. No dirty JobResults can be restored. > at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) > ~[event_executor-1.1.20.jar:?] > at > org.apache.flink.runtime.highavailability.FileSystemJobResultStore.getDirtyResultsInternal(FileSystemJobResultStore.java:182) > ~[event_executor-1.1.20.jar:?] > at > org.apache.flink.runtime.highavailability.AbstractThreadsafeJobResultStore.withReadLock(AbstractThreadsafeJobResultStore.java:118) > ~[event_executor-1.1.20.jar:?] > at > org.apache.flink.runtime.highavailability.AbstractThreadsafeJobResultStore.getDirtyResults(AbstractThreadsafeJobResultStore.java:100) > ~[event_executor-1.1.20.jar:?] > at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.getDirtyJobResults(SessionDispatcherLeaderProcess.java:194) > ~[event_executor-1.1.20.jar:?] > at > org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198) > ~[event_executor-1.1.20.jar:?] > at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.getDirtyJobResultsIfRunning(SessionDispatcherLeaderProcess.java:188) > ~[event_executor-1.1.20.jar:?] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > > Once we reinstall/helm upgrade then this exception goes away. How can this > be resolved, any additional configuration required to resolve this? > > I am using the following configuration for HA: > > high-availability.storageDir: file:///opt/flink/pm/ha > kubernetes.cluster-id: {{ include "fullname" . }}-cluster-{{ now | date > "20060102150405" }} > high-availability.jobmanager.port: 6123 > high-availability.type: kubernetes > high-availability: > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > kubernetes.namespace: {{ .Release.Namespace }} > > Thanks > > Regards > Amenreet Singh Sodhi > >