Hi Jin Yi, For standalone per-job cluster, it is a little different about the recovery. Just as you say, the user jar has built in the image, when the JobManager failed and relaunched by the K8s, the user `main()` will be executed again to get the job graph, not like session cluster to get the job graph from high-availability storage. Then the job will be submitted again and the job could recover from the latest checkpoint(assume that you have configured the high-availability).
Best, Yang Jin Yi <eleanore....@gmail.com> 于2020年2月27日周四 下午2:50写道: > Hi Yang, > > regarding your statement below: > > Since you are starting JM/TM with K8s deployment, when they failed new > JM/TM will be created. If you do not set the high > availability configuration, your jobs could recover when TM failed. > However, they could not recover when JM failed. With HA > configured, the jobs could always be recovered and you do not need to > re-submit again. > > Does it also apply to Flink Job Cluster? When the JM pod restarted by > Kubernetes, the image contains the application jar also, so if the > statement also applies to the Flink Job Cluster mode, can you please > elaborate why? > > Thanks a lot! > Eleanore > > On Mon, Feb 24, 2020 at 6:36 PM Yang Wang <danrtsey...@gmail.com> wrote: > >> Hi M Singh, >> >> > Mans - If we use the session based deployment option for K8 - I >>> thought K8 will automatically restarts any failed TM or JM. >>> In the case of failed TM - the job will probably recover, but in the >>> case of failed JM - perhaps we need to resubmit all jobs. >>> Let me know if I have misunderstood anything. >> >> >> Since you are starting JM/TM with K8s deployment, when they failed new >> JM/TM will be created. If you do not set the high >> availability configuration, your jobs could recover when TM failed. >> However, they could not recover when JM failed. With HA >> configured, the jobs could always be recovered and you do not need to >> re-submit again. >> >> > Mans - Is there any safe way of a passing creds ? >> >> >> Yes, you are right, Using configmap to pass the credentials is not safe. >> On K8s, i think you could use secrets instead[1]. >> >> > Mans - Does a task manager failure cause the job to fail ? My >>> understanding is the JM failure are catastrophic while TM failures are >>> recoverable. >> >> >> What i mean is the job failed, and it could be restarted by your >> configured restart strategy[2]. >> >> > Mans - So if we are saving checkpoint in S3 then there is no need for >>> disks - should we use emptyDir ? >> >> >> Yes, if you are saving the checkpoint in S3 and also set the >> `high-availability.storageDir` to S3. Then you do not need persistent >> volume. Since >> the local directory is only used for local cache, so you could directly >> use the overlay filesystem or empryDir(better io performance). >> >> >> [1]. >> https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/ >> [2]. >> https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance >> >> M Singh <mans2si...@yahoo.com> 于2020年2月25日周二 上午5:53写道: >> >>> Thanks Wang for your detailed answers. >>> >>> From what I understand the native_kubernetes also leans towards creating >>> a session and submitting a job to it. >>> >>> Regarding other recommendations, please my inline comments and advice. >>> >>> On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang < >>> danrtsey...@gmail.com> wrote: >>> >>> >>> Hi Singh, >>> >>> Glad to hear that you are looking to run Flink on the Kubernetes. I am >>> trying to answer your question based on my limited knowledge and >>> others could correct me and add some more supplements. >>> >>> I think the biggest difference between session cluster and per-job >>> cluster >>> on Kubernetesis the isolation. Since for per-job, a dedicated Flink >>> cluster >>> will be started for the only one job and no any other jobs could be >>> submitted. >>> Once the job is finished, then the Flink cluster will be >>> destroyed immediately. >>> The second point is one-step submission. You do not need to start a Flink >>> cluster first and then submit a job to the existing session. >>> >>> > Are there any benefits with regards to >>> 1. Configuring the jobs >>> No matter you are using the per-job cluster or submitting to the existing >>> session cluster, they share the configuration mechanism. You do not have >>> to change any codes and configurations. >>> >>> 2. Scaling the taskmanager >>> Since you are using the Standalone cluster on Kubernetes, it do not >>> provide >>> an active resourcemanager. You need to use external tools to monitor and >>> scale up the taskmanagers. The active integration is still evolving and >>> you >>> could have a taste[1]. >>> >>> Mans - If we use the session based deployment option for K8 - I thought >>> K8 will automatically restarts any failed TM or JM. >>> In the case of failed TM - the job will probably recover, but in the >>> case of failed JM - perhaps we need to resubmit all jobs. >>> Let me know if I have misunderstood anything. >>> >>> 3. Restarting jobs >>> For the session cluster, you could directly cancel the job and >>> re-submit. And >>> for per-job cluster, when the job is canceled, you need to start a new >>> per-job >>> cluster from the latest savepoint. >>> >>> 4. Managing the flink jobs >>> The rest api and flink command line could be used to managing the >>> jobs(e.g. >>> flink cancel, etc.). I think there is no difference for session and >>> per-job here. >>> >>> 5. Passing credentials (in case of AWS, etc) >>> I am not sure how do you provide your credentials. If you put them in >>> the >>> config map and then mount into the jobmanager/taskmanager pod, then both >>> session and per-job could support this way. >>> >>> Mans - Is there any safe way of a passing creds ? >>> >>> 6. Fault tolerence and recovery of jobs from failure >>> For session cluster, if one taskmanager crashed, then all the jobs which >>> have tasks >>> on this taskmanager will failed. >>> Both session and per-job could be configured with high availability and >>> recover >>> from the latest checkpoint. >>> >>> Mans - Does a task manager failure cause the job to fail ? My >>> understanding is the JM failure are catastrophic while TM failures are >>> recoverable. >>> >>> > Is there any need for specifying volume for the pods? >>> No, you do not need to specify a volume for pod. All the data in the pod >>> local directory is temporary. When a pod crashed and relaunched, the >>> taskmanager will retrieve the checkpoint from zookeeper + S3 and resume >>> from the latest checkpoint. >>> >>> Mans - So if we are saving checkpoint in S3 then there is no need for >>> disks - should we use emptyDir ? >>> >>> >>> [1]. >>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html >>> >>> M Singh <mans2si...@yahoo.com> 于2020年2月23日周日 上午2:28写道: >>> >>> Hey Folks: >>> >>> I am trying to figure out the options for running Flink on Kubernetes >>> and am trying to find out the pros and cons of running in Flink Session vs >>> Flink Cluster mode ( >>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes >>> ). >>> >>> I understand that in job mode there is no need to submit the job since >>> it is part of the job image. But what are other the pros and cons of this >>> approach vs session mode where a job manager is deployed and flink jobs can >>> be submitted it ? Are there any benefits with regards to: >>> >>> 1. Configuring the jobs >>> 2. Scaling the taskmanager >>> 3. Restarting jobs >>> 4. Managing the flink jobs >>> 5. Passing credentials (in case of AWS, etc) >>> 6. Fault tolerence and recovery of jobs from failure >>> >>> Also, we will be keeping the checkpoints for the jobs on S3. Is there >>> any need for specifying volume for the pods ? If volume is required do we >>> need provisioned volume and what are the recommended >>> alternatives/considerations especially with AWS. >>> >>> If there are any other considerations, please let me know. >>> >>> Thanks for your advice. >>> >>> >>> >>> >>>