Re: Task-manager kubernetes pods take a long time to terminate

Yun Tang Thu, 30 Jan 2020 08:54:36 -0800

Hi Li

Why you still use ’job-manager' as thejobmanager.rpc.address for the second new 
cluster? If you use another rpc address, previous task managers would not try 
to register with old one.


Take flink documentation [1] for k8s as example. You can list/delete all pods 
like:
kubectl get/delete pods -l app=flink

By the way, the default registration timeout is 5min [2], those taskmanager 
could not register to the JM will suicide after 5 minutes.

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#session-cluster-resource-definitions
[2] 
https://github.com/apache/flink/blob/7e1a0f446e018681cb537dd936ae54388b5a7523/flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java#L158

Best
Yun Tang

________________________________
From: Li Peng <li.p...@doordash.com>
Sent: Thursday, January 30, 2020 9:24
To: user <user@flink.apache.org>
Subject: Task-manager kubernetes pods take a long time to terminate

Hey folks, I'm deploying a Flink cluster via kubernetes, and starting each task 
manager with taskmanager.sh. I noticed that when I tell kubectl to delete the 
deployment, the job-manager pod usually terminates very quickly, but any 
task-manager that doesn't get terminated before the job-manager, usually gets 
stuck in this loop:

2020-01-29 09:18:47,867 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not 
resolve ResourceManager address 
akka.tcp://flink@job-manager:6123/user/resourcemanager, retrying in 10000 ms: 
Could not connect to rpc endpoint under address 
akka.tcp://flink@job-manager:6123/user/resourcemanager

It then does this for about 10 minutes(?), and then shuts down. If I'm 
deploying a new cluster, this pod will try to register itself with the new job 
manager before terminating lter. This isn't a troubling issue as far as I can 
tell, but I find it annoying that I sometimes have to force delete the pods.

Any easy ways to just have the task managers terminate gracefully and quickly?

Thanks,
Li

Re: Task-manager kubernetes pods take a long time to terminate

Reply via email to