Sounds good. Thank you! Hao Sun
On Thu, Feb 27, 2020 at 6:52 PM Yang Wang <danrtsey...@gmail.com> wrote: > Hi Hao Sun, > > I just post the explanation to the user ML so that others could also have > the same problem. > > Gven the job graph is fetched from the jar, do we still need Zookeeper for >> HA? Maybe we still need it for checkpoint locations? > > > Yes, we still need the zookeeper(maybe in the future we will have a native > K8s HA based on etcd) for the complete recovery. You > are right. We still need it for finding the checkpoint locations. Also the > Zookeeper will be used for leader election and leader retriever. > > > Best, > Yang > > Hao Sun <ha...@zendesk.com> 于2020年2月28日周五 上午1:41写道: > >> Hi Yang, given the job graph is fetched from the jar, do we still need >> Zookeeper for HA? Maybe we still need it for checkpoint locations? >> >> Hao Sun >> >> >> On Thu, Feb 27, 2020 at 5:13 AM Yang Wang <danrtsey...@gmail.com> wrote: >> >>> Hi Jin Yi, >>> >>> For standalone per-job cluster, it is a little different about the >>> recovery. >>> Just as you say, the user jar has built in the image, when the >>> JobManager failed >>> and relaunched by the K8s, the user `main()` will be executed again to >>> get the >>> job graph, not like session cluster to get the job graph from >>> high-availability storage. >>> Then the job will be submitted again and the job could recover from the >>> latest >>> checkpoint(assume that you have configured the high-availability). >>> >>> >>> Best, >>> Yang >>> >>> Jin Yi <eleanore....@gmail.com> 于2020年2月27日周四 下午2:50写道: >>> >>>> Hi Yang, >>>> >>>> regarding your statement below: >>>> >>>> Since you are starting JM/TM with K8s deployment, when they failed new >>>> JM/TM will be created. If you do not set the high >>>> availability configuration, your jobs could recover when TM failed. >>>> However, they could not recover when JM failed. With HA >>>> configured, the jobs could always be recovered and you do not need to >>>> re-submit again. >>>> >>>> Does it also apply to Flink Job Cluster? When the JM pod restarted by >>>> Kubernetes, the image contains the application jar also, so if the >>>> statement also applies to the Flink Job Cluster mode, can you please >>>> elaborate why? >>>> >>>> Thanks a lot! >>>> Eleanore >>>> >>>> On Mon, Feb 24, 2020 at 6:36 PM Yang Wang <danrtsey...@gmail.com> >>>> wrote: >>>> >>>>> Hi M Singh, >>>>> >>>>> > Mans - If we use the session based deployment option for K8 - I >>>>>> thought K8 will automatically restarts any failed TM or JM. >>>>>> In the case of failed TM - the job will probably recover, but in the >>>>>> case of failed JM - perhaps we need to resubmit all jobs. >>>>>> Let me know if I have misunderstood anything. >>>>> >>>>> >>>>> Since you are starting JM/TM with K8s deployment, when they failed new >>>>> JM/TM will be created. If you do not set the high >>>>> availability configuration, your jobs could recover when TM failed. >>>>> However, they could not recover when JM failed. With HA >>>>> configured, the jobs could always be recovered and you do not need to >>>>> re-submit again. >>>>> >>>>> > Mans - Is there any safe way of a passing creds ? >>>>> >>>>> >>>>> Yes, you are right, Using configmap to pass the credentials is not >>>>> safe. On K8s, i think you could use secrets instead[1]. >>>>> >>>>> > Mans - Does a task manager failure cause the job to fail ? My >>>>>> understanding is the JM failure are catastrophic while TM failures are >>>>>> recoverable. >>>>> >>>>> >>>>> What i mean is the job failed, and it could be restarted by your >>>>> configured restart strategy[2]. >>>>> >>>>> > Mans - So if we are saving checkpoint in S3 then there is no need >>>>>> for disks - should we use emptyDir ? >>>>> >>>>> >>>>> Yes, if you are saving the checkpoint in S3 and also set the >>>>> `high-availability.storageDir` to S3. Then you do not need persistent >>>>> volume. Since >>>>> the local directory is only used for local cache, so you could >>>>> directly use the overlay filesystem or empryDir(better io performance). >>>>> >>>>> >>>>> [1]. >>>>> https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/ >>>>> <https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure> >>>>> [2]. >>>>> https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance >>>>> <https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance> >>>>> >>>>> M Singh <mans2si...@yahoo.com> 于2020年2月25日周二 上午5:53写道: >>>>> >>>>>> Thanks Wang for your detailed answers. >>>>>> >>>>>> From what I understand the native_kubernetes also leans towards >>>>>> creating a session and submitting a job to it. >>>>>> >>>>>> Regarding other recommendations, please my inline comments and advice. >>>>>> >>>>>> On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang < >>>>>> danrtsey...@gmail.com> wrote: >>>>>> >>>>>> >>>>>> Hi Singh, >>>>>> >>>>>> Glad to hear that you are looking to run Flink on the Kubernetes. I am >>>>>> trying to answer your question based on my limited knowledge and >>>>>> others could correct me and add some more supplements. >>>>>> >>>>>> I think the biggest difference between session cluster and per-job >>>>>> cluster >>>>>> on Kubernetesis the isolation. Since for per-job, a dedicated Flink >>>>>> cluster >>>>>> will be started for the only one job and no any other jobs could be >>>>>> submitted. >>>>>> Once the job is finished, then the Flink cluster will be >>>>>> destroyed immediately. >>>>>> The second point is one-step submission. You do not need to start a >>>>>> Flink >>>>>> cluster first and then submit a job to the existing session. >>>>>> >>>>>> > Are there any benefits with regards to >>>>>> 1. Configuring the jobs >>>>>> No matter you are using the per-job cluster or submitting to the >>>>>> existing >>>>>> session cluster, they share the configuration mechanism. You do not >>>>>> have >>>>>> to change any codes and configurations. >>>>>> >>>>>> 2. Scaling the taskmanager >>>>>> Since you are using the Standalone cluster on Kubernetes, it do not >>>>>> provide >>>>>> an active resourcemanager. You need to use external tools to monitor >>>>>> and >>>>>> scale up the taskmanagers. The active integration is still evolving >>>>>> and you >>>>>> could have a taste[1]. >>>>>> >>>>>> Mans - If we use the session based deployment option for K8 - I >>>>>> thought K8 will automatically restarts any failed TM or JM. >>>>>> In the case of failed TM - the job will probably recover, but in the >>>>>> case of failed JM - perhaps we need to resubmit all jobs. >>>>>> Let me know if I have misunderstood anything. >>>>>> >>>>>> 3. Restarting jobs >>>>>> For the session cluster, you could directly cancel the job and >>>>>> re-submit. And >>>>>> for per-job cluster, when the job is canceled, you need to start a >>>>>> new per-job >>>>>> cluster from the latest savepoint. >>>>>> >>>>>> 4. Managing the flink jobs >>>>>> The rest api and flink command line could be used to managing the >>>>>> jobs(e.g. >>>>>> flink cancel, etc.). I think there is no difference for session and >>>>>> per-job here. >>>>>> >>>>>> 5. Passing credentials (in case of AWS, etc) >>>>>> I am not sure how do you provide your credentials. If you put them >>>>>> in the >>>>>> config map and then mount into the jobmanager/taskmanager pod, then >>>>>> both >>>>>> session and per-job could support this way. >>>>>> >>>>>> Mans - Is there any safe way of a passing creds ? >>>>>> >>>>>> 6. Fault tolerence and recovery of jobs from failure >>>>>> For session cluster, if one taskmanager crashed, then all the jobs >>>>>> which have tasks >>>>>> on this taskmanager will failed. >>>>>> Both session and per-job could be configured with high availability >>>>>> and recover >>>>>> from the latest checkpoint. >>>>>> >>>>>> Mans - Does a task manager failure cause the job to fail ? My >>>>>> understanding is the JM failure are catastrophic while TM failures are >>>>>> recoverable. >>>>>> >>>>>> > Is there any need for specifying volume for the pods? >>>>>> No, you do not need to specify a volume for pod. All the data in the >>>>>> pod >>>>>> local directory is temporary. When a pod crashed and relaunched, the >>>>>> taskmanager will retrieve the checkpoint from zookeeper + S3 and >>>>>> resume >>>>>> from the latest checkpoint. >>>>>> >>>>>> Mans - So if we are saving checkpoint in S3 then there is no need for >>>>>> disks - should we use emptyDir ? >>>>>> >>>>>> >>>>>> [1]. >>>>>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html >>>>>> <https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html> >>>>>> >>>>>> M Singh <mans2si...@yahoo.com> 于2020年2月23日周日 上午2:28写道: >>>>>> >>>>>> Hey Folks: >>>>>> >>>>>> I am trying to figure out the options for running Flink on Kubernetes >>>>>> and am trying to find out the pros and cons of running in Flink Session >>>>>> vs >>>>>> Flink Cluster mode ( >>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes >>>>>> <https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes>). >>>>>> >>>>>> I understand that in job mode there is no need to submit the job >>>>>> since it is part of the job image. But what are other the pros and cons >>>>>> of >>>>>> this approach vs session mode where a job manager is deployed and flink >>>>>> jobs can be submitted it ? Are there any benefits with regards to: >>>>>> >>>>>> 1. Configuring the jobs >>>>>> 2. Scaling the taskmanager >>>>>> 3. Restarting jobs >>>>>> 4. Managing the flink jobs >>>>>> 5. Passing credentials (in case of AWS, etc) >>>>>> 6. Fault tolerence and recovery of jobs from failure >>>>>> >>>>>> Also, we will be keeping the checkpoints for the jobs on S3. Is >>>>>> there any need for specifying volume for the pods ? If volume is >>>>>> required >>>>>> do we need provisioned volume and what are the recommended >>>>>> alternatives/considerations especially with AWS. >>>>>> >>>>>> If there are any other considerations, please let me know. >>>>>> >>>>>> Thanks for your advice. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>