Task manager errors with Flink ZooKeeper High Availability

Koffman, Noa (Nokia - IL/Kfar Sava) Wed, 16 Feb 2022 13:00:40 -0800

Hi,
We are currently running flink in session deployment on k8s cluster, with 1 
job-manager and 3 task-managers
To support recovery from job-manager failure, following a different mail thread,
We have enabled zookeeper high availability using a k8s Persistent Volume


To achieve this, we’ve added these conf values:
    high-availability: zookeeper
    high-availability.zookeeper.quorum: zk-noa-edge-infra:2181
    high-availability.zookeeper.path.root: /flink
    high-availability.storageDir: /flink_state
    high-availability.jobmanager.port: 6150
for the storageDir, we are using a k8s persistent volume with ReadWriteOnce

Recovery of job-manager failure is working now, but it looks like there are 
issues with the task-managers:
The same configuration file is used in the task-managers as well, and there are 
a lot of error in the task-manager’s logs –
java.io.FileNotFoundException: 
/flink_state/flink/blob/job_9f4be579c7ab79817e25ed56762b7623/blob_p-5cf39313e388d9120c235528672fd267105be0e0-938e4347a98aa6166dc2625926fdab56
 (No such file or directory)

It seems that the task-managers are trying to access the job-manager’s storage 
dir – can this be avoided?
The task manager does not have access to the job manager persistent volume – is 
this mandatory?
If we don’t have the option to use shared storage, is there a way to make 
zookeeper hold and manage the job states, instead of using the shared storage?

Thanks
Noa

Task manager errors with Flink ZooKeeper High Availability

Reply via email to