Re: Flink Jobmanager HA deployment on k8s

Ufuk Celebi Thu, 21 Jan 2021 05:08:09 -0800

@Yang: I think this would be valuable to document. I think it's a natural 
question to ask whether you can have standby JMs with Kubernetes. What do you 
think? If you agree, we could create a JIRA ticket and work on the "official" 
docs for this.


On Thu, Jan 21, 2021, at 5:05 AM, Yang Wang wrote:
> Hi Amit Bhatia
> 
> > What is the correct way to start three jobmanager replicas with zk? Is 
> > there any link which explains this deployment scenario and configuration?
> Please find more information in the last mail. Unfortunately, we do not have 
> some documentation to guide the users how to achieve that.
> 
> > How we'll identify that out of three replicas, which Job Manager replica is 
> > the leader?
> Just like what I have said, using a K8s service for the jobmanager rpc 
> address is not a good practice.
> TaskManager/Client could not know which replica is the leader. Instead, we 
> should bind the rpc address
> to pod ip. After then, TaskManager/Client could find the leader address(pod 
> ip) via ZooKeeper.
> 
> Could you please update your yaml files and deploy again? I think you will 
> have different results then.
> 
> Best,
> Yang
> 
> Yang Wang <danrtsey...@gmail.com> 于2021年1月21日周四 上午11:59写道：
>> Hi Chirag Dewan,
>> 
>> Yes, we could have multiple replicas with ZK HA in K8 as well. Multiple 
>> JobManagers will contend for
>> a leader and then write its rpc address to the ZooKeeper nodes. You could 
>> find more information how the
>> HA service works here[1]. It is about the KubernetesHAService, but the 
>> ZooKeeperHAService has the same
>> mechanism.
>> 
>> In such a case, I strongly suggest not using the service as the JobManager 
>> rpc address. Otherwise, we
>> will have the issue you have mentioned. There are 3 replicas behind the same 
>> service endpoint and only
>> one of them is the leader. TaskManager/Client do not know how to contact the 
>> leader.
>> 
>> Instead, I suggest not creating the internal service and bind the pod ip to 
>> the JobManager rpc address.
>> After then, TaskManager/Client will retrieve the leader address(pod ip + 
>> port) and contact via such an address.
>> 
>> Please find more information and the example here[1].
>> 
>> [1]. 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-144%3A+Native+Kubernetes+HA+for+Flink
>> [2]. 
>> https://issues.apache.org/jira/browse/FLINK-20982?focusedCommentId=17265715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17265715
>> 
>> Best,
>> Yang
>> 
>> 
>> Amit Bhatia <bhatia.amit1...@gmail.com> 于2021年1月20日周三 下午12:27写道：
>>>   Hi Yang,
>>> 
>>> I tried the deployment of flink with three replicas of Jobmanger to test a 
>>> faster job recovery scenario.  Below is my deployment :
>>> 
>>>  $ kubectl get po -namit | grep zk
>>> eric-data-coordinator-zk-0                                        1/1     
>>> Running            0          6d21h
>>> eric-data-coordinator-zk-1                                        1/1     
>>> Running            0          6d21h
>>> eric-data-coordinator-zk-2                                        1/1     
>>> Running            0          6d21h
>>> flink-jobmanager-ha-zk-1-5d58dc469-8bjpb                          1/1     
>>> Running            0          19h
>>> flink-jobmanager-ha-zk-1-5d58dc469-klg5p                          1/1     
>>> Running            0          19h
>>> flink-jobmanager-ha-zk-1-5d58dc469-kvwzk                          1/1     
>>> Running            0          19h
>>> 
>>> 
>>>  $ kubectl get svc -namit | grep zk
>>> flink-jobmanager-ha-rest-zk1                NodePort       10.100.118.186   
>>> <none>        8081:32115/TCP                                 21h
>>> flink-jobmanager-ha-zk1                     ClusterIP      10.111.135.174   
>>> <none>        6123/TCP,6124/TCP,8081/TCP                     21h
>>> eric-data-coordinator-zk                    ClusterIP      10.105.139.167   
>>> <none>        2181/TCP,8080/TCP,21007/TCP                    7d20h
>>> eric-data-coordinator-zk-ensemble-service   ClusterIP      None             
>>> <none>        2888/TCP,3888/TCP                              7d20h
>>> 
>>> Flink Configmap:
>>> ====================
>>> apiVersion: v1
>>> kind: ConfigMap
>>> metadata:
>>>   name: flink-config-ha-zk-1
>>>   namespace: amit
>>>   labels:
>>>     app: flink
>>> data:
>>>   flink-conf.yaml: |+
>>>     jobmanager.rpc.address: flink-jobmanager-ha-zk1
>>>     taskmanager.numberOfTaskSlots: 2
>>>     blob.server.port: 6124
>>>     jobmanager.rpc.port: 6123
>>>     taskmanager.rpc.port: 6122
>>>     queryable-state.proxy.ports: 6125
>>>     jobmanager.memory.process.size: 1600m
>>>     taskmanager.memory.process.size: 1728m
>>>     parallelism.default: 2
>>>     # High Availability parameters
>>>     high-availability: zookeeper
>>>     high-availability.cluster-id: /haclusterzk1
>>>     high-availability.storageDir: file:///opt/flink/recovery/
>>>     high-availability.zookeeper.path.root: /flinkhazk
>>>     high-availability.zookeeper.quorum: eric-data-coordinator-zk:2181
>>>     high-availability.jobmanager.port: 6123
>>> ===============================================================
>>> 
>>> Out of the three replicas of Job manager pods in one of the pod i am 
>>> getting this error:
>>> 
>>> 2021-01-19 08:18:33,982 INFO  
>>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService [] - 
>>> Starting ZooKeeperLeaderElectionService 
>>> ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}.
>>> 2021-01-19 08:21:39,381 WARN  
>>> org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] - 
>>> Error while retrieving the leader gateway. Retrying to connect to 
>>> akka.tcp://flink@flink-jobmanager-ha-zk1:6123/user/rpc/dispatcher_1.
>>> 2021-01-19 08:21:42,521 WARN  
>>> org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] - 
>>> Error while retrieving the leader gateway. Retrying to connect to 
>>> akka.tcp://flink@flink-jobmanager-ha-zk1:6123/user/rpc/dispatcher_1.
>>> 2021-01-19 08:21:45,508 WARN  
>>> org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] - 
>>> Error while retrieving the leader gateway. Retrying to connect to 
>>> akka.tcp://flink@flink-jobmanager-ha-zk1:6123/user/rpc/dispatcher_1.
>>> 2021-01-19 08:21:46,369 WARN  
>>> org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] - 
>>> Error while retrieving the leader gateway. Retrying to connect to 
>>> akka.tcp://flink@flink-jobmanager-ha-zk1:6123/user/rpc/dispatcher_1.
>>> 2021-01-19 08:22:13,658 WARN  
>>> org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] - 
>>> Error while retrieving the leader gateway. Retrying to connect to 
>>> akka.tcp://flink@flink-jobmanager-ha-zk1:6123/user/rpc/dispatcher_1.
>>> 2021-01-20 04:10:39,836 WARN  
>>> org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] - 
>>> Error while retrieving the leader gateway. Retrying to connect to 
>>> akka.tcp://flink@flink-jobmanager-ha-zk1:6123/user/rpc/dispatcher_1.
>>> 
>>> 
>>> And when trying to access the GUI getting below error:
>>> 
>>> image.png
>>> 
>>> In zookeeper i could see all the three id's are there
>>> 
>>> [zk: localhost:2181(CONNECTED) 5] ls 
>>> /flinkhazk/haclusterzk1/leaderlatch/dispatcher_lock
>>> [_c_1d5fc8b1-063f-4a1c-ad0f-ec46b6f10f36-latch-0000000020, 
>>> _c_229d0739-8854-4a5a-ace7-377d9edc575f-latch-0000000018, 
>>> _c_4eac3aaf-3f0f-4297-ac7f-086821548697-latch-0000000019]
>>> [zk: localhost:2181(CONNECTED) 6]
>>> 
>>> So i have below queries on this:
>>> 
>>> 1) what is the correct way to start three jobmanager replicas with zk ? Is 
>>> there any link which explains this deployment scenario and configuration ?
>>> 
>>> 2) How we'll identify that out of three replicas, which Job Manager replica 
>>> is the leader ?
>>> 
>>> Regards,
>>> Amit Bhatia
>>> 
>>> 
>>> 
>>> On Wed, Jan 20, 2021 at 9:44 AM Chirag Dewan <chirag.dewa...@yahoo.in> 
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Can we have multiple replicas with ZK HA in K8 as well?
>>>> In this case, how does Task Managers and clients recover the Job Manager 
>>>> RPC address? Are they updated in ZK?
>>>> Also, since there are 3 replicas behind the same service endpoint and only 
>>>> one of them is the leader, how should clients reach the leader Job Manager?
>>>> 
>>>> On Wednesday, 20 January, 2021, 07:41:20 am IST, Yang Wang 
>>>> <danrtsey...@gmail.com> wrote:
>>>> 
>>>> 
>>>> If you do not want to run multiple JobManagers simultaneously, then I 
>>>> think the "Job" for application cluster
>>>> with HA enable is enough.
>>>> K8s will also launch a new pod/container when the old one terminated 
>>>> exceptionally.
>>>> 
>>>> Best,
>>>> Yang
>>>> 
>>>> Yang Wang <danrtsey...@gmail.com> 于2021年1月20日周三 上午10:08写道：
>>>>> Yes. Using a "Deployment" instead of "Job" for the application cluster 
>>>>> also makes sense.
>>>>> Actually, in the native K8s integration, we always use the deployment for 
>>>>> JobManager.
>>>>> 
>>>>> But please note that the deployment may relaunch the JobManager pod even 
>>>>> though you cancel
>>>>> the Flink job.
>>>>> 
>>>>> Best,
>>>>> Yang
>>>>> 
>>>>> Ashish Nigam <ashnigamt...@gmail.com> 于2021年1月20日周三 上午5:29写道：
>>>>>> Yang,
>>>>>> For Application clusters, does it make sense to deploy JobManager as 
>>>>>> "Deployment" rather than as a "Job", as suggested in docs?
>>>>>> I am asking this because I am thinking of deploying a job manager in HA 
>>>>>> mode even for application clusters.
>>>>>> 
>>>>>> Thanks
>>>>>> Ashish
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jan 19, 2021 at 6:16 AM Yang Wang <danrtsey...@gmail.com> wrote:
>>>>>>> Usually, you do not need to start multiple JobManager simultaneously. 
>>>>>>> The JobManager is a deployment.
>>>>>>> A new one pod/container will be launched once it terminated 
>>>>>>> exceptionally. 
>>>>>>> 
>>>>>>> If you still want to start multiple JobManagers to get a faster 
>>>>>>> recovery, you could set the replica greater than 1
>>>>>>> for standalone cluster on K8s[1]. For native integration[2], we still 
>>>>>>> have not supported such configuration[2].
>>>>>>> 
>>>>>>> Please note that the key point to enable HA is not start multiple 
>>>>>>> JobManagers simultaneously or sequently.
>>>>>>> You need to set the ZooKeeperHAService[4] or KubernetesHAService[5] to 
>>>>>>> ensure the Flink job could recover
>>>>>>> from latest successful checkpoint.
>>>>>>> 
>>>>>>> [1]. 
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-master/deployment/resource-providers/standalone/kubernetes.html#session-cluster-resource-definitions
>>>>>>> [2]. 
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-master/deployment/resource-providers/native_kubernetes.html
>>>>>>> [3]. https://issues.apache.org/jira/browse/FLINK-17707
>>>>>>> [4]. 
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-master/deployment/ha/zookeeper_ha.html
>>>>>>> [5]. 
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-master/deployment/ha/kubernetes_ha.html
>>>>>>> 
>>>>>>> Best,
>>>>>>> Yang
>>>>>>> 
>>>>>>> Amit Bhatia <bhatia.amit1...@gmail.com> 于2021年1月19日周二 下午8:45写道：
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I am deploying Flink 1.12 on K8s. Can anyone confirm if we can deploy 
>>>>>>>> multiple job manager pods in K8s for HA or it should always be only a 
>>>>>>>> single job manager pod ?
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Amit Bhatia

Re: Flink Jobmanager HA deployment on k8s

Reply via email to