Hi All, I am deploying Flink cluster on Kubernetes in HA mode. But i noticed, whenever i deploy Flink cluster for first time on K8s cluster, it is not able to populate the cluster configmap, and due to which JM fails with the following exception:
2023-07-06 16:46:11,428 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint. java.util.concurrent.CompletionException: java.lang.IllegalStateException: The base directory of the JobResultStore isn't accessible. No dirty JobResults can be restored. at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) [?:?] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?] Caused by: java.lang.IllegalStateException: The base directory of the JobResultStore isn't accessible. No dirty JobResults can be restored. at org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) ~[event_executor-1.1.20.jar:?] at org.apache.flink.runtime.highavailability.FileSystemJobResultStore.getDirtyResultsInternal(FileSystemJobResultStore.java:182) ~[event_executor-1.1.20.jar:?] at org.apache.flink.runtime.highavailability.AbstractThreadsafeJobResultStore.withReadLock(AbstractThreadsafeJobResultStore.java:118) ~[event_executor-1.1.20.jar:?] at org.apache.flink.runtime.highavailability.AbstractThreadsafeJobResultStore.getDirtyResults(AbstractThreadsafeJobResultStore.java:100) ~[event_executor-1.1.20.jar:?] at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.getDirtyJobResults(SessionDispatcherLeaderProcess.java:194) ~[event_executor-1.1.20.jar:?] at org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198) ~[event_executor-1.1.20.jar:?] at org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.getDirtyJobResultsIfRunning(SessionDispatcherLeaderProcess.java:188) ~[event_executor-1.1.20.jar:?] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) ~[?:?] Once we reinstall/helm upgrade then this exception goes away. How can this be resolved, any additional configuration required to resolve this? I am using the following configuration for HA: high-availability.storageDir: file:///opt/flink/pm/ha kubernetes.cluster-id: {{ include "fullname" . }}-cluster-{{ now | date "20060102150405" }} high-availability.jobmanager.port: 6123 high-availability.type: kubernetes high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory kubernetes.namespace: {{ .Release.Namespace }} Thanks Regards Amenreet Singh Sodhi