Hi,

I have setup Filnk 1.9.1 on Kubernetes on AWS EKS with one job manager pod, 10 
task manager pods, one pod per EC2 instance. Job runs fine. After a while, for 
some reason, one pod (task manager) crashed, then the pod restarted. After 
that, the job got into a bad state. All the parallelisms are showing different 
color (orange, purple) on the console. I had to basically stop the entire job. 
My question is should a task manager restart affect the entire cluster/job? Or 
should it join back gracefully?

Second question is regarding to auto scaling Flink cluster on kubernetes. If I 
add more nodes/pods (task manager containers) to the cluster, will a running 
Flink job redistribute load to the additional resources or I have to stop to a 
savepoint, and restart the job?

Thanks and regards.
Ivan

Reply via email to