[ https://issues.apache.org/jira/browse/FLINK-20249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236216#comment-17236216 ]
Till Rohrmann commented on FLINK-20249: --------------------------------------- I guess you are right that we don't need the internal service for native K8s in non-HA mode. What the {{K8sResourceManager}} needs to make sure is that the TMs are being started with the address/IP of the JM. However, on the other hand, how expensive is it to have this internal service [~jiang7chengzitc]? > Rethink the necessity of the k8s internal Service even in non-HA mode > --------------------------------------------------------------------- > > Key: FLINK-20249 > URL: https://issues.apache.org/jira/browse/FLINK-20249 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes > Affects Versions: 1.11.0 > Reporter: jiang7chengzitc > Priority: Minor > Labels: pull-request-available > Fix For: 1.11.3 > > Attachments: k8s internal service.pdf > > > In non-HA mode, k8s will create internal service that directs the > communication from TaskManagers Pod to JobManager Pod, and TM Pods could > re-register to the new JM Pod once a JM Pod failover occurs. > However recently I do an experiment and find a problem that k8s will first > create new TM pods and then destory old TM pods after a period of time once > JM Pod failover (note: new JM podIP has changed), then job will be reschedule > by JM on new TM pods, it means new TM has been registered to JM. > During this process, internal service is active all the time, but I think it > is not necessary that keep this internal service, In other words, wo can weed > out internal service and use JM podIP for TM pods communication with JM pod, > In this case, it be consistent with HA mode. > Finally,related experiments is in attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)