Re: About Native Deployment's Autoscaling implementation

2022-06-01 Thread Yang Wang
Hi Talat, Using sub resources for the auto scaling makes a lot of sense to me. Could you be more specific why you think changing task manager count will > not work for native deployment ? The native K8s integration is using active resourcemanager. It means that the TaskManager count will be cal

Re: About Native Deployment's Autoscaling implementation

2022-05-31 Thread Talat Uyarer
Hi Yang and Gyula, Yang, Could you give a little bit more information ? What prevents us from changing task managers' count ? I am aware of ActiveResourceManager of Flink. But Flink only calls resources when it initializes a cluster. If we set - jobmanager.scheduler: adaptive - cluster.dec

Re: About Native Deployment's Autoscaling implementation

2022-05-30 Thread Yang Wang
> > I thought we could enable Adaptive Scheduler, so adding or removing a task > manager is the same as restarting a job when we use an adaptive scheduler. > Do I miss anything ? It is true for standalone mode since adding/removing a TaskManager pod is fully controlled by users(or external tools)

Re: About Native Deployment's Autoscaling implementation

2022-05-29 Thread Gyula Fóra
Hi Talat! Sorry for the late reply, I have been busy with some fixes for the release and travelling. I think the prometheus metrics integration sounds like a great idea that would cover the needs of most users. This way users can also integrate easily with the custom Flink metrics too. maxReplic

Re: About Native Deployment's Autoscaling implementation

2022-05-25 Thread Talat Uyarer
Hi Yang, I thought we could enable Adaptive Scheduler, so adding or removing a task manager is the same as restarting a job when we use an adaptive scheduler. Do I miss anything ? Thanks On Tue, May 24, 2022 at 8:16 PM Yang Wang wrote: > Thanks for the interesting discussion. > > Compared with

Re: About Native Deployment's Autoscaling implementation

2022-05-25 Thread Talat Uyarer
Hi Gyula, I did some investigation. Kubernetes developers suggest not using the kubernetes metric system for application specific metrics. [1] Currently only possible workflow is over prometheus. Prometheus is used widely on Kubernetes deployments. Kubernetes metrics provides Custom Metrics API, e

Re: About Native Deployment's Autoscaling implementation

2022-05-24 Thread Yang Wang
Thanks for the interesting discussion. Compared with reactive mode, leveraging the flink-kubernetes-operator to do the job restarting/upgrading is another solution for auto-scaling. Given that fully restarting a Flink application on K8s is not too slow, this is a reasonable way. Really hope we cou

Re: About Native Deployment's Autoscaling implementation

2022-05-24 Thread Gyula Fóra
Hi Talat! It would be great to have a HPA that works based on some flink throughput/backlog metrics. I wonder how you are going to access the Flink metrics in the HPA, we might need some integration with the k8s metrics system. In any case whether we need a FLIP or not depends on the complexity, i

Re: About Native Deployment's Autoscaling implementation

2022-05-24 Thread Talat Uyarer
Hi Gyula, This seems very promising for initial scaling. We are using Flink Kubernetes Operators. Most probably we are very early adapters for it :) Let me try it. Get back to you soon. My plan is building a general purpose CPU and backlog/throughput base autoscaling for Flink. I can create a Cus

Re: About Native Deployment's Autoscaling implementation

2022-05-23 Thread Gyula Fóra
Hi Talat! One other approach that we are investigating currently is combining the Flink Kubernetes Operator with the K8S scaling capabilities (Horizontal Pod autoscaler) In this approach the HPA monitors the Taskmanager pods directly and can m

Re: About Native Deployment's Autoscaling implementation

2022-05-23 Thread David Morávek
Hi Talat, This is definitely an interesting and rather complex topic. Few unstructured thoughts / notes / questions: - The main struggle has always been that it's hard to come up with a generic one-size-fits-it-all metrics for autoscaling. - Flink doesn't have knowledge of the external environ

About Native Deployment's Autoscaling implementation

2022-05-22 Thread Talat Uyarer
Hi, I am working on auto scaling support for native deployments. Today Flink provides Reactive mode however it only runs on standalone deployments. We use Kubernetes native deployment. So I want to increase or decrease job resources for our streamin jobs. Recent Flip-138 and Flip-160 are very usefu