Re: Flink operator autoscaler scaling down

2023-12-11 Thread Gyula Fóra
Could you please elaborate a little in which scenarios you find that the pending record metrics are not good to track Kafka lag? Thanks Gyula On Mon, Dec 11, 2023 at 4:26 PM Yang LI wrote: > Hello, > > Following our recent discussion, I've successfully implemented a Flink > operator featuring a

Re: Flink operator autoscaler scaling down

2023-12-11 Thread Yang LI
Hello, Following our recent discussion, I've successfully implemented a Flink operator featuring a "memory protection" patch. However, in the course of my testing, I've encountered an issue: the Flink operator relies on the 'pending_record' metric to gauge backlog. Unfortunately, this metric doesn

Re: Flink operator autoscaler scaling down

2023-11-15 Thread Yang LI
Thanks Maximilian and Gyula, I'll keep you updated. Best, Yang On Sat, 11 Nov 2023 at 16:18, Maximilian Michels wrote: > Hi Yang, > > We're always open to changes / additions to the autoscaler logic and > metric collection. Preferably, we change these directly in the > autoscaler implementation

Re: Flink operator autoscaler scaling down

2023-11-11 Thread Maximilian Michels
Hi Yang, We're always open to changes / additions to the autoscaler logic and metric collection. Preferably, we change these directly in the autoscaler implementation, without adding additional processes or controllers. Let us know how your experiments go! If you want to contribute, a JIRA with a

Re: Flink operator autoscaler scaling down

2023-11-07 Thread Yang LI
Hi Gyula, Thank you for the feedback! With your permission, I plan to integrate the implementation into the flink-kubernetes-operator-autoscaler module to test it on my env. Subsequently, maybe contribute these changes back to the community by submitting a pull request to the GitHub repository in

Re: Flink operator autoscaler scaling down

2023-11-07 Thread Gyula Fóra
Sounds like a lot of work for very little gain to me. If you really feel that there is some room for improvement with the current implementation, it may be simpler to fix that . Gyula On Tue, 7 Nov 2023 at 01:20, Yang LI wrote: > Thanks for the information! > > I haven't tested Kuberntes's buil

Re: Flink operator autoscaler scaling down

2023-11-07 Thread Yang LI
Thanks for the information! I haven't tested Kuberntes's built-in rollback mechanism yet. I feel like I can create another independent operator which detects flink application jvm memory and triggers rollback. Another solution I would like to discuss is also to implement an independent operator.

Re: Flink operator autoscaler scaling down

2023-11-06 Thread Gyula Fóra
Hey! Bit of a tricky problem, as it's not really possible to know that the job will be able to start with lower parallelism in some cases. Custom plugins may work but that would be an extremely complex solution at this point. The Kubernetes operator has a built-in rollback mechanism that can help

Flink operator autoscaler scaling down

2023-11-06 Thread Yang LI
Dear Flink Community, I am currently working on implementing auto-scaling for my Flink application using the Flink operator's autoscaler. During testing, I encountered a "java.lang.OutOfMemoryError: Java heap space" exception when the autoscaler attempted to scale down. This issue arises when the