Hi Yang, Thank you for your reply.
Yes, we have evaluated job specific clusters (as we used to deploy the same in YARN) , the main issue is Job monitoring of multiple jobs as we won't be having a single endpoint like YARN does . We will evaluate K8's operator you have suggested Thanks and Regards, Vinay Patil On Wed, Jul 29, 2020 at 11:08 AM Yang Wang <danrtsey...@gmail.com> wrote: > Hi Vinay Patil, > > You are right. Flink does not provide any isolation between different jobs > in the same Flink session cluster. > You could use Flink job cluster or application cluster(from 1.11) to get > better isolation since a dedicated Flink > cluster will be started for each job. > > Please refer to the standalone K8s job cluster[1] or native K8s > application mode[2] for more information. > > If you want to get a tool for managing multiple jobs, maybe > flink-k8s-operator is a good choice[3][4]. > Also I am trying to build a java implemented flink-native-k8s-operator[5], > please checkout if you are interested. > > [1]. > https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html#deploy-job-cluster > [2]. > https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#flink-kubernetes-application > [3]. https://github.com/lyft/flinkk8soperator > [4]. https://github.com/GoogleCloudPlatform/flink-on-k8s-operator > [5]. https://github.com/wangyang0918/flink-native-k8s-operator > > > Best, > Yang > > Vinay Patil <vinay18.pa...@gmail.com> 于2020年7月29日周三 上午12:15写道: > >> Hi Team, >> >> We have a session cluster running on K8 where multiple stateless jobs are >> running fine. We observed that once we submit a stateful job (state size >> per checkpoint is 1GB) to the same session cluster other jobs are impacted >> because this job starts to utilise more memory and CPU and eventually >> terminates the pod. >> >> To mitigate this issue and provide better resource isolation we have >> created multiple session clusters where we will launch a high >> throughput (stateful) job in one cluster and club low throughput jobs in >> another cluster. >> This seems to work fine but managing this will be painful once we start >> to create more session cluster for high throughput jobs (10 plus jobs) as >> we will not have a single flink endpoint to submit the job ( as we have it >> in YARN where we submit directly to RM ) >> >> Can you please provide me inputs on how we should handle this better in >> Kubernetes >> >> >> >> Regards, >> Vinay Patil >> >