Hi everyone, Thanks for the productive conversation on FLIP-543.
I agree that we need more extensibility in the autoscaler. The predictive scaling use case is a perfect example of a powerful feature that would help many of us improve job availability by scaling before backlogs build up. To echo Gyula and Max's points, I also believe the best path forward is to build this capability as an extension to the existing framework, not as a replacement. This would offer a robust, community-driven solution for a common problem, which feels more sustainable than asking users to implement and maintain custom forks of the logic. Best, Rui On Thu, Aug 28, 2025 at 7:14 AM Pradeepta Choudhury <pchoudhur...@apple.com.invalid> wrote: > Hello Peter, > > To start with, great initiative! But I echo the same concern raised about > creating too many extension points can compromise the autoscaler > functionality. > When we proposed FLIP-514 [1] and a custom evaluator, the aim was twofold: > provide the required extension point and ship practical strategies as > pluggables. At the same time, we wanted to preserve flexibility for > advanced, highly specific scenarios—like predictive scaling—that differ by > ecosystem, platform, and company. The custom evaluator strikes that balance > was the thought process: it lets users adjust the evaluated > metrics—especially TARGET_DATA_RATE—that drive the scale-factor > calculation, enabling useful out-of-the-box behavior without constraining > bespoke implementations. > One of the desired outcomes we had set for FLIP-514 was to ship a > scheduled-scaling strategy as a pluggable, leveraging a baseline period and > explicit scheduled windows to drive planned capacity changes. I’ve been > away since last month due to personal commitments. I plan to resume after > first week of September and will complete the scheduled-scaling plugin to > wrap up the custom evaluator. > Having the ScalingRealizer pluggable ( > https://github.com/apache/flink-kubernetes-operator/pull/1020/files), > definitely sounds helpful for certain scenarios. > But I totally agree with the general approach suggested by Gyula, about > solving specific issues independently in the "best possible way" and then > coming to a good solution regarding pluggability that could be foundation > for future use-cases. > > > Thanks and Regards > Pradeepta > > > > On 26 Aug 2025, at 6:05 PM, ctrlaltd...@icloud.com.invalid < > ctrlaltd...@icloud.com.INVALID> wrote: > > > > From the ScalingRealizer, I think having before/after hooks for > `realizeParallelismOverrides` and `realizeConfigOverrides` would be good. > We can support these hooks from plugins, thoughts? > > > > > > Best, > > Diljeet(DJ) Singh > > > > On 2025/08/26 08:24:33 Maximilian Michels wrote: > >> Hi Peter, > >> > >> First of all, this is a great initiative. Flink Autoscaling definitely > >> needs more points of extension. We recently added support for hooking > >> into the metric evaluation (FLIP-514), but clearly that is just one > >> extension point. > >> > >> That said, I think we will need to revise the approach a bit. I'm not > >> sure, we should be replacing core components. As Gyula mentioned, > >> replacing those will easily break the entire autoscaler. Instead, we > >> should be adding extension points which allow for meaningful additions > >> without breaking the scaling logic. There is already the option to > >> replace the entire autoscaling module, if users really want to roll > >> out a completely custom version. > >> > >> What usually works best is to formulate the use case first, then > >> figure out what autoscaler customization would be necessary to > >> implement the use case. > >> > >> As for making the ScalingRealizer pluggable > >> (https://github.com/apache/flink-kubernetes-operator/pull/1020/files), > >> I do think that makes sense for some scenarios. > >> > >> Cheers, > >> Max > >> > >> On Tue, Aug 26, 2025 at 8:59 AM Gyula Fóra <gy...@gmail.com> wrote: > >>> > >>> Hi Peter & Diljeet! > >>> > >>> My general feedback is that we should try to introduce extension > plugins instead of plugins that completely replace key parts of the > autoscaler code. > >>> > >>> Let me give you a concrete example through FLIP-514 and FLIP-543 using > the MetricsEvaluator pluggability. > >>> The MetricsEvaluator in the autoscaler is responsible for > evaluating/deriving/calculating metrics from the collected metrics. It has > to calculate everything in a more or less specific way otherwise other > parts of the autoscaler that depend on these metrics may not work. It > doesn't seem very practical/resonable to completely reimplement this just > because someone wants to extend the logic, this is extremely error prone > and fragile especially if the autoscaler logic later evolves. > >>> > >>> FLIP-514 takes the approach to extend the metric evaluator with a new > method that allows users to at the end modify the evaluated metrics and > define custom ones. This is the right approach here as it makes a new > extension very simple to build and maintain without interfering with > existing logic. > >>> > >>> The approach in FLIP-543 and in Diljeet's example PR takes the > replacement approach to completely substitute the entire parts of the > implementation (the entire evaluator, scaling realizer etc). I think this > is not very good for either the community or the actual user. From a > community perspective it makes it harder to extend the logic with nice > small additions and from a user's perspective it is very error probe if the > operator autoscaler logic changes as it basically exposes a lot of internal > logic on a user interface. > >>> > >>> So at this point, -1 for the approach in FLIP-543 from my side, but I > would love to hear the opinion of others as well. > >>> > >>> Cheers > >>> Gyula > >>> > >>> On Mon, Aug 25, 2025 at 11:44 PM Peter Huang <hu...@gmail.com> wrote: > >>>> > >>>> Hi Diljeet, > >>>> > >>>> Yes, I think we have similar requirements to make autoscaler even more > >>>> powerful to handle some customized requirements. > >>>> The quick PoC makes sense to me. Let's get some more feedback from the > >>>> community. > >>>> > >>>> > >>>> > >>>> Best Regards > >>>> Peter Huang > >>>> > >>>> > >>>> > >>>> On Mon, Aug 25, 2025 at 2:37 PM Peter Huang <hu...@gmail.com> > >>>> wrote: > >>>> > >>>>> Just try to combine the discussion into one thread. > >>>>> > >>>>> @Diljeet Singh > >>>>> Posted a quick PoC for the proposal > >>>>> https://github.com/apache/flink-kubernetes-operator/pull/1020. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Mon, Aug 25, 2025 at 7:52 AM Peter Huang <hu...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Hi Community, > >>>>>> > >>>>>> Our org has been heavily using the Flink autoscaling algorithm. It > >>>>>> greatly reduced our operation overhead and improved cost efficiency > >>>>>> as users always over provision resources when onboard. Recently, we > have > >>>>>> had some requirements to customize the auto scaling algorithm > >>>>>> for different scenarios, for example, during the holiday season > large but > >>>>>> predictable traffic spike, increase checkpoint interval together > with > >>>>>> scale up for streaming ingestion use cases. > >>>>>> > >>>>>> We search through the discussion about the topic in the mail list > >>>>>> including the existing FLIP-514 > >>>>>> < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-514%3A+Custom+Evaluator+plugin+for+Flink+Autoscaler > >. > >>>>>> Looks like the discussion is not finalized yet. > >>>>>> To accelerate the process, we adopt and combine the > >>>>>> existing opinions from the community and create a proposal in > FLIP-543 > >>>>>> < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-543%3A+Support+Customized+Autoscale+Algorithm > >. > >>>>>> The basic idea > >>>>>> is to make some core components of autoscaler pluggable, for > example, > >>>>>> MetricsCollector, Metrics Evaluator, and ScalingRealizer, at the > same > >>>>>> keep the core logic skeleton (which is already well justified in > large > >>>>>> amount of users) of autoscaler untouched. > >>>>>> > >>>>>> Looking forward to any feedback and opinions on FLIP-543. > >>>>>> > >>>>>> [1] > >>>>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-543%3A+Support+Customized+Autoscale+Algorithm > >>>>>> [2] > >>>>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-514%3A+Custom+Evaluator+plugin+for+Flink+Autoscaler > >>>>>> [3] Other related discussion thread > >>>>>> > >>>>>> https://lists.apache.org/thread/749l74z1h5jylkxrw3rtjmxcj2t9p7ws > >>>>>> > >>>>>> https://lists.apache.org/thread/mcd7jcn4kz6oqtyqq5hfycjf9mqh6c53 > >>>>>> > >>>>>> > >>>>>> Best Regards > >>>>>> Peter Huang > >>>>>> > >>>>> > >