Hey folks, I'm deploying a Flink cluster via kubernetes, and starting each
task manager with taskmanager.sh. I noticed that when I tell kubectl to
delete the deployment, the job-manager pod usually terminates very quickly,
but any task-manager that doesn't get terminated before the job-manager,
usually gets stuck in this loop:

2020-01-29 09:18:47,867 INFO
 org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
resolve ResourceManager address
akka.tcp://flink@job-manager:6123/user/resourcemanager,
retrying in 10000 ms: Could not connect to rpc endpoint under address
akka.tcp://flink@job-manager:6123/user/resourcemanager

It then does this for about 10 minutes(?), and then shuts down. If I'm
deploying a new cluster, this pod will try to register itself with the new
job manager before terminating lter. This isn't a troubling issue as far as
I can tell, but I find it annoying that I sometimes have to force delete
the pods.

Any easy ways to just have the task managers terminate gracefully and
quickly?

Thanks,
Li

Reply via email to