So do you start your Flink cluster on K8s with the yaml here[1]? I have tested multiple times, and it always works well. If not, could you share your yaml file with me?
[1]. https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html#session-cluster-resource-definitions Best, Yang Li Peng <li.p...@doordash.com> 于2020年2月5日周三 上午5:53写道: > Hey Yang, > > The jobmanager and taskmanagers are all part of the same deployment, when > I delete the deployment all the pods are told to be terminated. > > The status of the taskmanager is "terminating", and it waits until the > taskmanager times out in that error loop before it actually terminates. > > Thanks, > Li > > On Thu, Jan 30, 2020 at 6:22 PM Yang Wang <danrtsey...@gmail.com> wrote: > >> I think if you want to delete your Flink cluster on K8s, then you need to >> directly delete all the >> created deployments(jobmanager deploy, taskmanager deploy). For the >> configmap and service, >> you could leave them there if you want to reuse them by the next Flink >> cluster deploy. >> >> What's the status of taskmanager pod when you delete it and get stuck? >> >> >> Best, >> Yang >> >> Li Peng <li.p...@doordash.com> 于2020年1月31日周五 上午4:51写道: >> >>> Hi Yun, >>> >>> I'm currently specifying that specific RPC address in my kubernetes >>> charts for conveniene, should I be generating a new one for every >>> deployment? >>> >>> And yes, I am deleting the pods using those commands, I'm just noticing >>> that the task-manager termination process is short circuited by the >>> registration timeout check, so that instead of terminating quickly, the >>> task-manger would wait for 5 minutes to timeout before terminating. I'm >>> expecting it to just terminate without doing that registration timeout, is >>> there a way to configure that? >>> >>> Thanks, >>> Li >>> >>> >>> On Thu, Jan 30, 2020 at 8:53 AM Yun Tang <myas...@live.com> wrote: >>> >>>> Hi Li >>>> >>>> Why you still use ’job-manager' as thejobmanager.rpc.address for the >>>> second new cluster? If you use another rpc address, previous task managers >>>> would not try to register with old one. >>>> >>>> Take flink documentation [1] for k8s as example. You can list/delete >>>> all pods like: >>>> >>>> kubectl get/delete pods -l app=flink >>>> >>>> >>>> By the way, the default registration timeout is 5min [2], those >>>> taskmanager could not register to the JM will suicide after 5 minutes. >>>> >>>> [1] >>>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#session-cluster-resource-definitions >>>> [2] >>>> https://github.com/apache/flink/blob/7e1a0f446e018681cb537dd936ae54388b5a7523/flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java#L158 >>>> >>>> Best >>>> Yun Tang >>>> >>>> ------------------------------ >>>> *From:* Li Peng <li.p...@doordash.com> >>>> *Sent:* Thursday, January 30, 2020 9:24 >>>> *To:* user <user@flink.apache.org> >>>> *Subject:* Task-manager kubernetes pods take a long time to terminate >>>> >>>> Hey folks, I'm deploying a Flink cluster via kubernetes, and starting >>>> each task manager with taskmanager.sh. I noticed that when I tell kubectl >>>> to delete the deployment, the job-manager pod usually terminates very >>>> quickly, but any task-manager that doesn't get terminated before the >>>> job-manager, usually gets stuck in this loop: >>>> >>>> 2020-01-29 09:18:47,867 INFO >>>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not >>>> resolve ResourceManager address >>>> akka.tcp://flink@job-manager:6123/user/resourcemanager, >>>> retrying in 10000 ms: Could not connect to rpc endpoint under address >>>> akka.tcp://flink@job-manager:6123/user/resourcemanager >>>> >>>> It then does this for about 10 minutes(?), and then shuts down. If I'm >>>> deploying a new cluster, this pod will try to register itself with the new >>>> job manager before terminating lter. This isn't a troubling issue as far as >>>> I can tell, but I find it annoying that I sometimes have to force delete >>>> the pods. >>>> >>>> Any easy ways to just have the task managers terminate gracefully and >>>> quickly? >>>> >>>> Thanks, >>>> Li >>>> >>>