Hi Yang,

Thank you for your reply.

Yes, we have evaluated job specific clusters (as we used to deploy the same
in YARN) , the main issue is Job monitoring of multiple jobs as we won't be
having a single endpoint like YARN does . We will evaluate K8's operator
you have suggested


Thanks and Regards,
Vinay Patil


On Wed, Jul 29, 2020 at 11:08 AM Yang Wang <danrtsey...@gmail.com> wrote:

> Hi Vinay Patil,
>
> You are right. Flink does not provide any isolation between different jobs
> in the same Flink session cluster.
> You could use Flink job cluster or application cluster(from 1.11) to get
> better isolation since a dedicated Flink
> cluster will be started for each job.
>
> Please refer to the standalone K8s job cluster[1] or native K8s
> application mode[2] for more information.
>
> If you want to get a tool for managing multiple jobs, maybe
> flink-k8s-operator is a good choice[3][4].
> Also I am trying to build a java implemented flink-native-k8s-operator[5],
> please checkout if you are interested.
>
> [1].
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html#deploy-job-cluster
> [2].
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#flink-kubernetes-application
> [3]. https://github.com/lyft/flinkk8soperator
> [4]. https://github.com/GoogleCloudPlatform/flink-on-k8s-operator
> [5]. https://github.com/wangyang0918/flink-native-k8s-operator
>
>
> Best,
> Yang
>
> Vinay Patil <vinay18.pa...@gmail.com> 于2020年7月29日周三 上午12:15写道:
>
>> Hi Team,
>>
>> We have a session cluster running on K8 where multiple stateless jobs are
>> running fine. We observed that once we submit a stateful job (state size
>> per checkpoint is 1GB) to the same session cluster other jobs are impacted
>> because this job starts to utilise more memory and CPU and eventually
>> terminates the pod.
>>
>> To mitigate this issue and provide better resource isolation we have
>> created multiple session clusters where we will launch a high
>> throughput (stateful) job in one cluster and club low throughput jobs in
>> another cluster.
>> This seems to work fine but managing this will be painful once we start
>> to create more session cluster for high throughput jobs (10 plus jobs) as
>> we will not have a single flink endpoint to submit the job ( as we have it
>> in YARN where we submit directly to RM )
>>
>> Can you please provide me inputs on how we should handle this better in
>> Kubernetes
>>
>>
>>
>> Regards,
>> Vinay Patil
>>
>

Reply via email to