Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Yang Wang Sun, 23 Feb 2020 19:01:56 -0800

Hi Singh,

Glad to hear that you are looking to run Flink on the Kubernetes. I am
trying to answer your question based on my limited knowledge and
others could correct me and add some more supplements.


I think the biggest difference between session cluster and per-job cluster
on Kubernetesis the isolation. Since for per-job, a dedicated Flink cluster
will be started for the only one job and no any other jobs could be
submitted.
Once the job is finished, then the Flink cluster will be
destroyed immediately.
The second point is one-step submission. You do not need to start a Flink
cluster first and then submit a job to the existing session.

> Are there any benefits with regards to
1. Configuring the jobs
No matter you are using the per-job cluster or submitting to the existing
session cluster, they share the configuration mechanism. You do not have
to change any codes and configurations.

2. Scaling the taskmanager
Since you are using the Standalone cluster on Kubernetes, it do not provide
an active resourcemanager. You need to use external tools to monitor and
scale up the taskmanagers. The active integration is still evolving and you
could have a taste[1].

3. Restarting jobs
For the session cluster, you could directly cancel the job and re-submit.
And
for per-job cluster, when the job is canceled, you need to start a new
per-job
cluster from the latest savepoint.

4. Managing the flink jobs
The rest api and flink command line could be used to managing the jobs(e.g.
flink cancel, etc.). I think there is no difference for session and per-job
here.

5. Passing credentials (in case of AWS, etc)
I am not sure how do you provide your credentials. If you put them in the
config map and then mount into the jobmanager/taskmanager pod, then both
session and per-job could support this way.

6. Fault tolerence and recovery of jobs from failure
For session cluster, if one taskmanager crashed, then all the jobs which
have tasks
on this taskmanager will failed.
Both session and per-job could be configured with high availability and
recover
from the latest checkpoint.

> Is there any need for specifying volume for the pods?
No, you do not need to specify a volume for pod. All the data in the pod
local directory is temporary. When a pod crashed and relaunched, the
taskmanager will retrieve the checkpoint from zookeeper + S3 and resume
from the latest checkpoint.


[1].
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html

M Singh <mans2si...@yahoo.com> 于2020年2月23日周日 上午2:28写道：

> Hey Folks:
>
> I am trying to figure out the options for running Flink on Kubernetes and
> am trying to find out the pros and cons of running in Flink Session vs
> Flink Cluster mode (
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes
> ).
>
> I understand that in job mode there is no need to submit the job since it
> is part of the job image.  But what are other the pros and cons of this
> approach vs session mode where a job manager is deployed and flink jobs can
> be submitted it ?  Are there any benefits with regards to:
>
> 1. Configuring the jobs
> 2. Scaling the taskmanager
> 3. Restarting jobs
> 4. Managing the flink jobs
> 5. Passing credentials (in case of AWS, etc)
> 6. Fault tolerence and recovery of jobs from failure
>
> Also, we will be keeping the checkpoints for the jobs on S3.  Is there any
> need for specifying volume for the pods ?  If volume is required do we need
> provisioned volume and what are the recommended alternatives/considerations
> especially with AWS.
>
> If there are any other considerations, please let me know.
>
> Thanks for your advice.
>
>
>
>
>

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Reply via email to