Thanks guys, The reason I am interested in rolling update is to avoid complete restarts in the case of parameter (for example parallelism) changes.
> On Dec 15, 2020, at 8:40 PM, Yang Wang <danrtsey...@gmail.com> wrote: > > Hi Boris, > > What is -p 10? > It is same to --parallelism 10. Set the default parallelism to 10. > > does it require a special container build? > No, the official flink docker image could be used directly. Unfortunately, we > do not have the image now. And we are trying to figure out. > You could follow the instructions below to have your own image. > > > git clone https://github.com/apache/flink-docker.git > <https://github.com/apache/flink-docker.git> > > git checkout dev-master > > ./add-custom.sh -u > https://apache.website-solution.net/flink/flink-1.12.0/flink-1.12.0-bin-scala_2.11.tgz > > <https://apache.website-solution.net/flink/flink-1.12.0/flink-1.12.0-bin-scala_2.11.tgz> > -n flink-1.12.0 > > cd dev/flink-1.12.0-debian > docker build . -t flink:flink-1.12.0 > docker push flink:flink-1.12.0 > > This is if I use HDFS for save pointing, right? I can instead use PVC - based > save pointing, correct? > It is an example to storing the HA related data to OSS(Alibaba Cloud Object > Storage, similar to S3). Since we require a distributed storage, I am afraid > you could not use a PVC here. Instead, you could using a minio. > > Can I control the amount of standby JMs? > Currently, you could not control the number of JobManagers. This is only > because we have not introduce a config option for it. But you could do it > manually via `kubectl edit deploy <clusterID>`. It should also work. > > Finally, what is behavior on the rolling restart of JM deployment? > Once a JobManager terminated, it will lose the leadership and a standby one > will take over. So on the rolling restart of JM deployment, you will find > that the leader switches multiple times and your job also restarts multiple > times. I am not sure why you need to roll the JobManager deployment. We are > using deployment for JobManager in Flink just because we want the JobManager > to be launched once it crashed. Another reason for multiple JobManagers is to > get a faster recovery. > > > Best, > Yang > > > Boris Lublinsky <boris.lublin...@lightbend.com > <mailto:boris.lublin...@lightbend.com>> 于2020年12月16日周三 上午9:09写道: > Thanks Chesney for your quick response, > I read documentation > https://cwiki.apache.org/confluence/display/FLINK/FLIP-144%3A+Native+Kubernetes+HA+for+Flink#FLIP144:NativeKubernetesHAforFlink-NativeK8s > > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-144:+Native+Kubernetes+HA+for+Flink#FLIP144:NativeKubernetesHAforFlink-NativeK8s> > More carefully and found the sample, I was looking for: > > ./bin/flink run-application -p 10 -t kubernetes-application > -Dkubernetes.cluster-id=k8s-ha-app1 \ > -Dkubernetes.container.image=flink:k8s-ha \ > -Dkubernetes.container.image.pull-policy=Always \ > -Djobmanager.heap.size=4096m -Dtaskmanager.memory.process.size=4096m \ > -Dkubernetes.jobmanager.cpu=1 -Dkubernetes.taskmanager.cpu=2 > -Dtaskmanager.numberOfTaskSlots=4 \ > -Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > \ > -Dhigh-availability.storageDir=oss://flink/flink-ha \ > -Drestart-strategy=fixed-delay -Drestart-strategy.fixed-delay.attempts=10 \ > -Dcontainerized.master.env.ENABLE_BUILT_IN_PLUGINS=flink-oss-fs-hadoop-1.12.jar > \ > -Dcontainerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS=flink-oss-fs-hadoop-1.12.jar > \ > local:///opt/flink/examples/streaming/StateMachineExample.jar <> > > A couple of questions about it: > > ./bin/flink run-application -p 10 -t used to be ./bin/flink run-application > -t. What is -p 10? > -Dkubernetes.container.image=flink:k8s-ha does it require a special container > build? > > -Dhigh-availability.storageDir=oss://flink/flink-ha \ > -Dcontainerized.master.env.ENABLE_BUILT_IN_PLUGINS=flink-oss-fs-hadoop-1.12.jar > \ > -Dcontainerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS=flink-oss-fs-hadoop-1.12.jar > \ > > This is if I use HDFS for save pointing, right? I can instead use PVC - based > save pointing, correct? > > Also I was trying to understand, how it works, and from the documentation it > sounds like there is one active and one or > more standby JMs. Can I control the amount of standby JMs? > > Finally, what is behavior on the rolling restart of JM deployment? > > > > >> On Dec 15, 2020, at 10:42 AM, Chesnay Schepler <ches...@apache.org >> <mailto:ches...@apache.org>> wrote: >> >> Unfortunately no; there are some discussions going on in the >> docker-library/official-images PR >> <https://github.com/docker-library/official-images/pull/9249> that have to >> be resolved first, but currently these would require changes on the Flink >> side that we cannot do (because it is already released!). We are not sure >> yet whether we can get the PR accepted and defer further changes to 1.12.1 . >> >> On 12/15/2020 5:17 PM, Boris Lublinsky wrote: >>> Thanks. >>> Do you have ETA for docker images? >>> >>> >>>> On Dec 14, 2020, at 3:43 AM, Chesnay Schepler <ches...@apache.org >>>> <mailto:ches...@apache.org>> wrote: >>>> >>>> 1) It is compiled with Java 8 but runs on Java 8 & 11. >>>> 2) Docker images are not yet published. >>>> 3) It is mentioned at the top of the Kubernetes HA Services documentation >>>> that it also works for the native Kubernetes integration. >>>> Kubernetes high availability services can only be used when deploying to >>>> Kubernetes. Consequently, they can be configured when using standalone >>>> Flink on Kubernetes >>>> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/standalone/kubernetes.html> >>>> or the native Kubernetes integration >>>> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/native_kubernetes.html> >>>> From what I understand you only need to configure the 3 listed options; >>>> the documentation also contains an example configuration >>>> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/ha/kubernetes_ha.html#example-configuration>. >>>> >>>> On 12/14/2020 4:52 AM, Boris Lublinsky wrote: >>>>> It is great that Flink 1.12 is out. Several questions: >>>>> >>>>> 1. Is official Flink 1.12 distribution >>>>> https://flink.apache.org/downloads.html >>>>> <https://flink.apache.org/downloads.html> specifies Scala versions, but >>>>> not Java versions. Is it Java 8? >>>>> 2. I do not see any 1.12 docker images here >>>>> https://hub.docker.com/_/flink <https://hub.docker.com/_/flink>. Are they >>>>> somewhere else? >>>>> 3 Flink 1.12 introduces Kubernetes HA support >>>>> https://ci.apache.org/projects/flink/flink-docs-stable/deployment/ha/kubernetes_ha.html >>>>> >>>>> <https://ci.apache.org/projects/flink/flink-docs-stable/deployment/ha/kubernetes_ha.html>, >>>>> but Flink native Kubernetes support >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/native_kubernetes.html >>>>> >>>>> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/native_kubernetes.html> >>>>> has no mentioning of HA. Are the 2 integrated? DO you have any examples >>>>> of starting HA cluster using Flink native Kubernetes? >>>>> >>>>> >>>> >>> >> >