Hi, We are currently running flink in session deployment on k8s cluster, with 1 job-manager and 3 task-managers To support recovery from job-manager failure, following a different mail thread, We have enabled zookeeper high availability using a k8s Persistent Volume
To achieve this, we’ve added these conf values: high-availability: zookeeper high-availability.zookeeper.quorum: zk-noa-edge-infra:2181 high-availability.zookeeper.path.root: /flink high-availability.storageDir: /flink_state high-availability.jobmanager.port: 6150 for the storageDir, we are using a k8s persistent volume with ReadWriteOnce Recovery of job-manager failure is working now, but it looks like there are issues with the task-managers: The same configuration file is used in the task-managers as well, and there are a lot of error in the task-manager’s logs – java.io.FileNotFoundException: /flink_state/flink/blob/job_9f4be579c7ab79817e25ed56762b7623/blob_p-5cf39313e388d9120c235528672fd267105be0e0-938e4347a98aa6166dc2625926fdab56 (No such file or directory) It seems that the task-managers are trying to access the job-manager’s storage dir – can this be avoided? The task manager does not have access to the job manager persistent volume – is this mandatory? If we don’t have the option to use shared storage, is there a way to make zookeeper hold and manage the job states, instead of using the shared storage? Thanks Noa