[jira] [Comment Edited] (FLINK-20249) Rethink the necessity of the k8s internal Service even in non-HA mode

jiang7chengzitc (Jira) Sat, 21 Nov 2020 01:47:34 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-20249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236623#comment-17236623
 ]


jiang7chengzitc edited comment on FLINK-20249 at 11/21/20, 9:46 AM:
--------------------------------------------------------------------

Thanks for your prompt reply. In response to your question, I answer from the 
following aspects：

1. There is no doubt that service is a import concept on the k8s, so it will 
occupy a certain amount of resources, although it can be ignored.

2. More importantly, internal service disappeared or withdrew due to various 
abnormal factors during the JM and TMs is running,  then the new TMs can not 
re-register to new JM and the application could not be scheduled once JM 
failover, it is quite serious and cause to waste of resources. However, use JM 
podIP will not cause this problem.

3. For the second point, I did an experiment to verify, which is in attached 
(k8s internal service - v2.pdf).

 


was (Author: jiang7chengzitc):
Thanks for your prompt reply. In response to your question, I answer from the 
following aspects：

1. There is no doubt that service is a import concept on the k8s, so it will 
occupy a certain amount of resources, although it can be ignored.

2. More importantly, internal service disappeared or withdrew due to various 
abnormal factors during the JM and TMs is running,  then the new TMs can not 
re-register to new JM and the application could not be scheduled once JM 
failover, it is quite serious and cause to waste of resources.

3. For the second point, I did an experiment to verify, which is in attached 
(k8s internal service - v2.pdf).

 

> Rethink the necessity of the k8s internal Service even in non-HA mode
> ---------------------------------------------------------------------
>
>                 Key: FLINK-20249
>                 URL: https://issues.apache.org/jira/browse/FLINK-20249
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.11.0
>            Reporter: jiang7chengzitc
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.11.3
>
>         Attachments: k8s internal service - v2.pdf, k8s internal service.pdf
>
>
> In non-HA mode, k8s will create internal service that directs the 
> communication from TaskManagers Pod to JobManager Pod, and TM Pods could 
> re-register to the new JM Pod once a JM Pod failover occurs.
> However recently I do an experiment and find a problem that k8s will first 
> create new TM pods and then destory old TM pods after a period of time once 
> JM Pod failover (note: new JM podIP has changed), then job will be reschedule 
> by JM on new TM pods, it means new TM has been registered to JM. 
> During this process, internal service is active all the time, but I think it 
> is not necessary that keep this internal service, In other words, wo can weed 
> out internal service and use JM podIP for TM pods communication with JM pod, 
> In this case, it be consistent with HA mode.
> Finally，related experiments is in attached (k8s internal service.pdf).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-20249) Rethink the necessity of the k8s internal Service even in non-HA mode

Reply via email to