[jira] [Comment Edited] (FLINK-20249) Rethink the necessity of the k8s internal Service even in non-HA mode

Ruguo Yu (Jira) Sat, 28 Nov 2020 06:44:35 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-20249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239987#comment-17239987
 ]


Ruguo Yu edited comment on FLINK-20249 at 11/28/20, 2:43 PM:
-------------------------------------------------------------

[~trohrmann] [~xintongsong],

I have two points to share with you: 

1. The same opinion as [~xintongsong]
{quote}Ok, I think I see your point.
{quote}whether we really need to support JM failover w/o HA.
{quote}
I think JM failover w/o HA is currently supported. I agree this is not 
something we want to promote/recommend, but I would suggest not to change it.
{quote}
{color:#172b4d}IMO, It indeed does not ensure the continuity of job and the 
integrity of data in non-HA mode after JM failover whether on yarn or k8s, if 
users use this mode then they need know and receive this price otherwise they 
should use HA mode.  so do nothing and preserve yarn/k8s  behavior.{color}

 2. Whether to consider adding a switch to determine the internal service start 
[~trohrmann]
{quote}2) this behaviour should only work for K8s. On Yarn, the old TMs should 
not be able to reconnect to the newly started JM w/o service discovery. Hence, 
I see this as a implementation specific feature of K8s which we should not 
promote.

I am not saying to remove it just to make it symmetric to the other 
implementations but we should not make this "public" API.
{quote}
the difference in service discovery between yarn/mesos and k8s makes them 
inconsistent in the possibility of reusing old TMs. if there is a configuration 
item such as "kubernetes.internal-serviece.enable",  it is true then start 
internal service and reusing old TMs instead of new after JM failover(in this 
case, FLINK-20332 need be solved first), false then opposite(disable internal 
service and start new TMs) which can be consistent with other active 
deployment. 

WDYT, tks!

 


was (Author: jiang7chengzitc):
[~trohrmann] [~xintongsong],

I have two points to share with you: 
 # The same opinion as [~xintongsong]

{quote}Ok, I think I see your point.
{quote}whether we really need to support JM failover w/o HA.
{quote}
I think JM failover w/o HA is currently supported. I agree this is not 
something we want to promote/recommend, but I would suggest not to change it.
{quote}IMO, It indeed does not ensure the continuity of job and the integrity 
of data in non-HA mode after JM failover whether on yarn or k8s, if users use 
this mode then they need know and receive this price otherwise they should use 
HA mode.  so do nothing and preserve yarn/k8s  behavior.
 # Whether to consider adding a switch to determine the internal service start 
[~trohrmann]

{quote}2) this behaviour should only work for K8s. On Yarn, the old TMs should 
not be able to reconnect to the newly started JM w/o service discovery. Hence, 
I see this as a implementation specific feature of K8s which we should not 
promote.

I am not saying to remove it just to make it symmetric to the other 
implementations but we should not make this "public" API.
{quote}the difference in service discovery between yarn/mesos and k8s makes 
them inconsistent in the possibility of reusing old TMs. if there is a 
configuration item such as "kubernetes.internal-serviece.enable",  it is true 
then start internal service and reusing old TMs instead of new after JM 
failover(in this case, FLINK-20332 need be solved first), false then 
opposite(disable internal service and start new TMs) which can be consistent 
with other active deployment. 

WDYT, tks!

 

> Rethink the necessity of the k8s internal Service even in non-HA mode
> ---------------------------------------------------------------------
>
>                 Key: FLINK-20249
>                 URL: https://issues.apache.org/jira/browse/FLINK-20249
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.11.0
>            Reporter: Ruguo Yu
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.12.0
>
>         Attachments: k8s internal service - in english.pdf, k8s internal 
> service - v2.pdf, k8s internal service.pdf
>
>
> In non-HA mode, k8s will create internal service that directs the 
> communication from TaskManagers Pod to JobManager Pod, and TM Pods could 
> re-register to the new JM Pod once a JM Pod failover occurs.
> However recently I do an experiment and find a problem that k8s will first 
> create new TM pods and then destory old TM pods after a period of time once 
> JM Pod failover (note: new JM podIP has changed), then job will be reschedule 
> by JM on new TM pods, it means new TM has been registered to JM. 
> During this process, internal service is active all the time, but I think it 
> is not necessary that keep this internal service, In other words, wo can weed 
> out internal service and use JM podIP for TM pods communication with JM pod, 
> In this case, it be consistent with HA mode.
> Finally，related experiments is in attached (k8s internal service.pdf).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-20249) Rethink the necessity of the k8s internal Service even in non-HA mode

Reply via email to