Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Yang Wang Thu, 27 Feb 2020 05:13:29 -0800

Hi Jin Yi,

For standalone per-job cluster, it is a little different about the
recovery.
Just as you say, the user jar has built in the image, when the JobManager
failed
and relaunched by the K8s, the user `main()` will be executed again to get
the
job graph, not like session cluster to get the job graph from
high-availability storage.
Then the job will be submitted again and the job could recover from the
latest
checkpoint(assume that you have configured the high-availability).



Best,
Yang

Jin Yi <eleanore....@gmail.com> 于2020年2月27日周四 下午2:50写道：

> Hi Yang,
>
> regarding your statement below:
>
> Since you are starting JM/TM with K8s deployment, when they failed new
> JM/TM will be created. If you do not set the high
> availability configuration, your jobs could recover when TM failed.
> However, they could not recover when JM failed. With HA
> configured, the jobs could always be recovered and you do not need to
> re-submit again.
>
> Does it also apply to Flink Job Cluster? When the JM pod restarted by
> Kubernetes, the image contains the application jar also, so if the
> statement also applies to the Flink Job Cluster mode, can you please
> elaborate why?
>
> Thanks a lot!
> Eleanore
>
> On Mon, Feb 24, 2020 at 6:36 PM Yang Wang <danrtsey...@gmail.com> wrote:
>
>> Hi M Singh,
>>
>> > Mans - If we use the session based deployment option for K8 - I
>>> thought K8 will automatically restarts any failed TM or JM.
>>> In the case of failed TM - the job will probably recover, but in the
>>> case of failed JM - perhaps we need to resubmit all jobs.
>>> Let me know if I have misunderstood anything.
>>
>>
>> Since you are starting JM/TM with K8s deployment, when they failed new
>> JM/TM will be created. If you do not set the high
>> availability configuration, your jobs could recover when TM failed.
>> However, they could not recover when JM failed. With HA
>> configured, the jobs could always be recovered and you do not need to
>> re-submit again.
>>
>> > Mans - Is there any safe way of a passing creds ?
>>
>>
>> Yes, you are right, Using configmap to pass the credentials is not safe.
>> On K8s, i think you could use secrets instead[1].
>>
>> > Mans - Does a task manager failure cause the job to fail ?  My
>>> understanding is the JM failure are catastrophic while TM failures are
>>> recoverable.
>>
>>
>> What i mean is the job failed, and it could be restarted by your
>> configured restart strategy[2].
>>
>> > Mans - So if we are saving checkpoint in S3 then there is no need for
>>> disks - should we use emptyDir ?
>>
>>
>> Yes, if you are saving the checkpoint in S3 and also set the
>> `high-availability.storageDir` to S3. Then you do not need persistent
>> volume. Since
>> the local directory is only used for local cache, so you could directly
>> use the overlay filesystem or empryDir(better io performance).
>>
>>
>> [1].
>> https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/
>> [2].
>> https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance
>>
>> M Singh <mans2si...@yahoo.com> 于2020年2月25日周二 上午5:53写道：
>>
>>> Thanks Wang for your detailed answers.
>>>
>>> From what I understand the native_kubernetes also leans towards creating
>>> a session and submitting a job to it.
>>>
>>> Regarding other recommendations, please my inline comments and advice.
>>>
>>> On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang <
>>> danrtsey...@gmail.com> wrote:
>>>
>>>
>>> Hi Singh,
>>>
>>> Glad to hear that you are looking to run Flink on the Kubernetes. I am
>>> trying to answer your question based on my limited knowledge and
>>> others could correct me and add some more supplements.
>>>
>>> I think the biggest difference between session cluster and per-job
>>> cluster
>>> on Kubernetesis the isolation. Since for per-job, a dedicated Flink
>>> cluster
>>> will be started for the only one job and no any other jobs could be
>>> submitted.
>>> Once the job is finished, then the Flink cluster will be
>>> destroyed immediately.
>>> The second point is one-step submission. You do not need to start a Flink
>>> cluster first and then submit a job to the existing session.
>>>
>>> > Are there any benefits with regards to
>>> 1. Configuring the jobs
>>> No matter you are using the per-job cluster or submitting to the existing
>>> session cluster, they share the configuration mechanism. You do not have
>>> to change any codes and configurations.
>>>
>>> 2. Scaling the taskmanager
>>> Since you are using the Standalone cluster on Kubernetes, it do not
>>> provide
>>> an active resourcemanager. You need to use external tools to monitor and
>>> scale up the taskmanagers. The active integration is still evolving and
>>> you
>>> could have a taste[1].
>>>
>>> Mans - If we use the session based deployment option for K8 - I thought
>>> K8 will automatically restarts any failed TM or JM.
>>> In the case of failed TM - the job will probably recover, but in the
>>> case of failed JM - perhaps we need to resubmit all jobs.
>>> Let me know if I have misunderstood anything.
>>>
>>> 3. Restarting jobs
>>> For the session cluster, you could directly cancel the job and
>>> re-submit. And
>>> for per-job cluster, when the job is canceled, you need to start a new
>>> per-job
>>> cluster from the latest savepoint.
>>>
>>> 4. Managing the flink jobs
>>> The rest api and flink command line could be used to managing the
>>> jobs(e.g.
>>> flink cancel, etc.). I think there is no difference for session and
>>> per-job here.
>>>
>>> 5. Passing credentials (in case of AWS, etc)
>>> I am not sure how do you provide your credentials. If you put them in
>>> the
>>> config map and then mount into the jobmanager/taskmanager pod, then both
>>> session and per-job could support this way.
>>>
>>> Mans - Is there any safe way of a passing creds ?
>>>
>>> 6. Fault tolerence and recovery of jobs from failure
>>> For session cluster, if one taskmanager crashed, then all the jobs which
>>> have tasks
>>> on this taskmanager will failed.
>>> Both session and per-job could be configured with high availability and
>>> recover
>>> from the latest checkpoint.
>>>
>>> Mans - Does a task manager failure cause the job to fail ?  My
>>> understanding is the JM failure are catastrophic while TM failures are
>>> recoverable.
>>>
>>> > Is there any need for specifying volume for the pods?
>>> No, you do not need to specify a volume for pod. All the data in the pod
>>> local directory is temporary. When a pod crashed and relaunched, the
>>> taskmanager will retrieve the checkpoint from zookeeper + S3 and resume
>>> from the latest checkpoint.
>>>
>>> Mans - So if we are saving checkpoint in S3 then there is no need for
>>> disks - should we use emptyDir ?
>>>
>>>
>>> [1].
>>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
>>>
>>> M Singh <mans2si...@yahoo.com> 于2020年2月23日周日 上午2:28写道：
>>>
>>> Hey Folks:
>>>
>>> I am trying to figure out the options for running Flink on Kubernetes
>>> and am trying to find out the pros and cons of running in Flink Session vs
>>> Flink Cluster mode (
>>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes
>>> ).
>>>
>>> I understand that in job mode there is no need to submit the job since
>>> it is part of the job image.  But what are other the pros and cons of this
>>> approach vs session mode where a job manager is deployed and flink jobs can
>>> be submitted it ?  Are there any benefits with regards to:
>>>
>>> 1. Configuring the jobs
>>> 2. Scaling the taskmanager
>>> 3. Restarting jobs
>>> 4. Managing the flink jobs
>>> 5. Passing credentials (in case of AWS, etc)
>>> 6. Fault tolerence and recovery of jobs from failure
>>>
>>> Also, we will be keeping the checkpoints for the jobs on S3.  Is there
>>> any need for specifying volume for the pods ?  If volume is required do we
>>> need provisioned volume and what are the recommended
>>> alternatives/considerations especially with AWS.
>>>
>>> If there are any other considerations, please let me know.
>>>
>>> Thanks for your advice.
>>>
>>>
>>>
>>>
>>>

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Reply via email to