Hi Koffman, From TM side the only possible usage come to me is that or components like BlobCache, which is used to transfer jars or large task informations between JM and TM. But specially for BlobService, if it failed to find the file it would turn to JM via http connection. If convenient could you also post the stack of the exception and may I have a double confirmation whether the job could still running normally with this exception?
Sorry that I might miss something~ Best, Yun ------------------Original Mail ------------------ Sender:Koffman, Noa (Nokia - IL/Kfar Sava) <noa.koff...@nokia.com> Send Date:Thu Feb 17 05:00:42 2022 Recipients:user <user@flink.apache.org> Subject:Task manager errors with Flink ZooKeeper High Availability Hi, We are currently running flink in session deployment on k8s cluster, with 1 job-manager and 3 task-managers To support recovery from job-manager failure, following a different mail thread, We have enabled zookeeper high availability using a k8s Persistent Volume To achieve this, we’ve added these conf values: high-availability: zookeeper high-availability.zookeeper.quorum: zk-noa-edge-infra:2181 high-availability.zookeeper.path.root: /flink high-availability.storageDir: /flink_state high-availability.jobmanager.port: 6150 for the storageDir, we are using a k8s persistent volume with ReadWriteOnce Recovery of job-manager failure is working now, but it looks like there are issues with the task-managers: The same configuration file is used in the task-managers as well, and there are a lot of error in the task-manager’s logs – java.io.FileNotFoundException: /flink_state/flink/blob/job_9f4be579c7ab79817e25ed56762b7623/blob_p-5cf39313e388d9120c235528672fd267105be0e0-938e4347a98aa6166dc2625926fdab56 (No such file or directory) It seems that the task-managers are trying to access the job-manager’s storage dir – can this be avoided? The task manager does not have access to the job manager persistent volume – is this mandatory? If we don’t have the option to use shared storage, is there a way to make zookeeper hold and manage the job states, instead of using the shared storage? Thanks Noa