Re: Flink on Kubernetes - Session vs Job cluster mode and storage

M Singh Wed, 26 Feb 2020 16:35:21 -0800

 BTW - Is there any limit to the amount of data that can be stored on emptyDir 
in K8 ?  
    On Wednesday, February 26, 2020, 07:33:54 PM EST, M Singh 
<[email protected]> wrote:  
 
  Thanks Yang and Arvid for your advice and pointers.  Mans
    On Wednesday, February 26, 2020, 09:52:26 AM EST, Arvid Heise 
<[email protected]> wrote:  
 
 Creds on AWS are typically resolved through roles assigned to K8s pods (for 
example with KIAM [1]).
[1] https://github.com/uswitch/kiam
On Tue, Feb 25, 2020 at 3:36 AM Yang Wang <[email protected]> wrote:


Hi M Singh,

> Mans - If we use the session based deployment option for K8 - I thought K8 
>will automatically restarts any failed TM or JM. 
In the case of failed TM - the job will probably recover, but in the case of 
failed JM - perhaps we need to resubmit all jobs.
Let me know if I have misunderstood anything.

Since you are starting JM/TM with K8s deployment, when they failed new JM/TM 
will be created. If you do not set the highavailability configuration, your 
jobs could recover when TM failed. However, they could not recover when JM 
failed. With HA
configured, the jobs could always be recovered and you do not need to re-submit 
again.

> Mans - Is there any safe way of a passing creds ?

Yes, you are right, Using configmap to pass the credentials is not safe. On 
K8s, i think you could use secrets instead[1].

> Mans - Does a task manager failure cause the job to fail ?  My understanding 
> is the JM failure are catastrophic while TM failures are recoverable.

What i mean is the job failed, and it could be restarted by your configured 
restart strategy[2].

> Mans - So if we are saving checkpoint in S3 then there is no need for disks - 
>should we use emptyDir ?
 Yes, if you are saving the checkpoint in S3 and also set the 
`high-availability.storageDir` to S3. Then you do not need persistent volume. 
Sincethe local directory is only used for local cache, so you could directly 
use the overlay filesystem or empryDir(better io performance).

[1]. 
https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/[2].
 
https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance
M Singh <[email protected]> 于2020年2月25日周二 上午5:53写道：

 Thanks Wang for your detailed answers.
>From what I understand the native_kubernetes also leans towards creating a 
>session and submitting a job to it.  
Regarding other recommendations, please my inline comments and advice.
    On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang 
<[email protected]> wrote:  
 
 Hi Singh,
Glad to hear that you are looking to run Flink on the Kubernetes. I amtrying to 
answer your question based on my limited knowledge andothers could correct me 
and add some more supplements.
I think the biggest difference between session cluster and per-job clusteron 
Kubernetesis the isolation. Since for per-job, a dedicated Flink clusterwill be 
started for the only one job and no any other jobs could be submitted.Once the 
job is finished, then the Flink cluster will be destroyed immediately.The 
second point is one-step submission. You do not need to start a Flinkcluster 
first and then submit a job to the existing session.
> Are there any benefits with regards to1. Configuring the jobsNo matter you 
>are using the per-job cluster or submitting to the existingsession cluster, 
>they share the configuration mechanism. You do not haveto change any codes and 
>configurations.
2. Scaling the taskmanagerSince you are using the Standalone cluster on 
Kubernetes, it do not providean active resourcemanager. You need to use 
external tools to monitor andscale up the taskmanagers. The active integration 
is still evolving and youcould have a taste[1].
Mans - If we use the session based deployment option for K8 - I thought K8 will 
automatically restarts any failed TM or JM. In the case of failed TM - the job 
will probably recover, but in the case of failed JM - perhaps we need to 
resubmit all jobs.Let me know if I have misunderstood anything.
3. Restarting jobsFor the session cluster, you could directly cancel the job 
and re-submit. Andfor per-job cluster, when the job is canceled, you need to 
start a new per-jobcluster from the latest savepoint.
4. Managing the flink jobsThe rest api and flink command line could be used to 
managing the jobs(e.g.flink cancel, etc.). I think there is no difference for 
session and per-job here.
5. Passing credentials (in case of AWS, etc)
I am not sure how do you provide your credentials. If you put them in the 
config map and then mount into the jobmanager/taskmanager pod, then bothsession 
and per-job could support this way.
Mans - Is there any safe way of a passing creds ?
6. Fault tolerence and recovery of jobs from failure
For session cluster, if one taskmanager crashed, then all the jobs which have 
taskson this taskmanager will failed. Both session and per-job could be 
configured with high availability and recoverfrom the latest checkpoint. 
Mans - Does a task manager failure cause the job to fail ?  My understanding is 
the JM failure are catastrophic while TM failures are recoverable.
> Is there any need for specifying volume for the pods?No, you do not need to 
>specify a volume for pod. All the data in the pod local directory is 
>temporary. When a pod crashed and relaunched, thetaskmanager will retrieve the 
>checkpoint from zookeeper + S3 and resumefrom the latest checkpoint.
Mans - So if we are saving checkpoint in S3 then there is no need for disks - 
should we use emptyDir ?

[1]. 
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
M Singh <[email protected]> 于2020年2月23日周日 上午2:28写道：

Hey Folks:
I am trying to figure out the options for running Flink on Kubernetes and am 
trying to find out the pros and cons of running in Flink Session vs Flink 
Cluster mode 
(https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes).
I understand that in job mode there is no need to submit the job since it is 
part of the job image.  But what are other the pros and cons of this approach 
vs session mode where a job manager is deployed and flink jobs can be submitted 
it ?  Are there any benefits with regards to:
1. Configuring the jobs 2. Scaling the taskmanager3. Restarting jobs4. Managing 
the flink jobs5. Passing credentials (in case of AWS, etc)6. Fault tolerence 
and recovery of jobs from failure
Also, we will be keeping the checkpoints for the jobs on S3.  Is there any need 
for specifying volume for the pods ?  If volume is required do we need 
provisioned volume and what are the recommended alternatives/considerations 
especially with AWS.
If there are any other considerations, please let me know.
Thanks for your advice.

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Reply via email to