From the ScalingRealizer, I think having before/after  hooks for 
`realizeParallelismOverrides` and `realizeConfigOverrides` would be good. We 
can support these hooks from plugins, thoughts?


Best,
Diljeet(DJ) Singh

On 2025/08/26 08:24:33 Maximilian Michels wrote:
> Hi Peter,
> 
> First of all, this is a great initiative. Flink Autoscaling definitely
> needs more points of extension. We recently added support for hooking
> into the metric evaluation (FLIP-514), but clearly that is just one
> extension point.
> 
> That said, I think we will need to revise the approach a bit. I'm not
> sure, we should be replacing core components. As Gyula mentioned,
> replacing those will easily break the entire autoscaler. Instead, we
> should be adding extension points which allow for meaningful additions
> without breaking the scaling logic. There is already the option to
> replace the entire autoscaling module, if users really want to roll
> out a completely custom version.
> 
> What usually works best is to formulate the use case first, then
> figure out what autoscaler customization would be necessary to
> implement the use case.
> 
> As for making the ScalingRealizer pluggable
> (https://github.com/apache/flink-kubernetes-operator/pull/1020/files),
> I do think that makes sense for some scenarios.
> 
> Cheers,
> Max
> 
> On Tue, Aug 26, 2025 at 8:59 AM Gyula Fóra <gy...@gmail.com> wrote:
> >
> > Hi Peter & Diljeet!
> >
> > My general feedback is that we should try to introduce extension plugins 
> > instead of plugins that completely replace key parts of the autoscaler code.
> >
> > Let me give you a concrete example through FLIP-514 and FLIP-543 using the 
> > MetricsEvaluator pluggability.
> > The MetricsEvaluator in the autoscaler is responsible for 
> > evaluating/deriving/calculating metrics from the collected metrics. It has 
> > to calculate everything in a more or less specific way otherwise other 
> > parts of the autoscaler that depend on these metrics may not work. It 
> > doesn't seem very practical/resonable to completely reimplement this just 
> > because someone wants to extend the logic, this is extremely error prone 
> > and fragile especially if the autoscaler logic later evolves.
> >
> > FLIP-514 takes the approach to extend the metric evaluator with a new 
> > method that allows users to at the end modify the evaluated metrics and 
> > define custom ones. This is the right approach here as it makes a new 
> > extension very simple to build and maintain without interfering with 
> > existing logic.
> >
> > The approach in FLIP-543 and in Diljeet's example PR takes the replacement 
> > approach to completely substitute the entire parts of the implementation 
> > (the entire evaluator, scaling realizer etc). I think this is not very good 
> > for either the community or the actual user. From a community perspective 
> > it makes it harder to extend the logic with nice small additions and from a 
> > user's perspective it is very error probe if the operator autoscaler logic 
> > changes as it basically exposes a lot of internal logic on a user interface.
> >
> > So at this point,  -1 for the approach in FLIP-543 from my side, but I 
> > would love to hear the opinion of others as well.
> >
> > Cheers
> > Gyula
> >
> > On Mon, Aug 25, 2025 at 11:44 PM Peter Huang <hu...@gmail.com> wrote:
> >>
> >> Hi Diljeet,
> >>
> >> Yes, I think we have similar requirements to make autoscaler even more
> >> powerful to handle some customized requirements.
> >> The quick PoC makes sense to me. Let's get some more feedback from the
> >> community.
> >>
> >>
> >>
> >> Best Regards
> >> Peter Huang
> >>
> >>
> >>
> >> On Mon, Aug 25, 2025 at 2:37 PM Peter Huang <hu...@gmail.com>
> >> wrote:
> >>
> >> > Just try to combine the discussion into one thread.
> >> >
> >> > @Diljeet Singh
> >> > Posted a quick PoC for the proposal
> >> > https://github.com/apache/flink-kubernetes-operator/pull/1020.
> >> >
> >> >
> >> >
> >> >
> >> > On Mon, Aug 25, 2025 at 7:52 AM Peter Huang <hu...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi Community,
> >> >>
> >> >> Our org has been heavily using the Flink autoscaling algorithm. It
> >> >> greatly reduced our operation overhead and improved cost efficiency
> >> >> as users always over provision resources when onboard. Recently, we have
> >> >> had some requirements to customize the auto scaling algorithm
> >> >> for different scenarios, for example, during the holiday season large 
> >> >> but
> >> >> predictable traffic spike, increase checkpoint interval together with
> >> >> scale up for streaming ingestion use cases.
> >> >>
> >> >> We search through the discussion about the topic in the mail list
> >> >> including the existing FLIP-514
> >> >> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-514%3A+Custom+Evaluator+plugin+for+Flink+Autoscaler>.
> >> >> Looks like the discussion is not finalized yet.
> >> >> To accelerate the process, we adopt and combine the
> >> >> existing opinions from the community and create a proposal in FLIP-543
> >> >> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-543%3A+Support+Customized+Autoscale+Algorithm>.
> >> >> The basic idea
> >> >> is to make some core components of autoscaler pluggable, for example,
> >> >> MetricsCollector, Metrics Evaluator, and ScalingRealizer, at the same
> >> >> keep the core logic skeleton (which is already well justified in large
> >> >> amount of users) of autoscaler untouched.
> >> >>
> >> >> Looking forward to any feedback and opinions on FLIP-543.
> >> >>
> >> >> [1]
> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-543%3A+Support+Customized+Autoscale+Algorithm
> >> >> [2]
> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-514%3A+Custom+Evaluator+plugin+for+Flink+Autoscaler
> >> >> [3] Other related discussion thread
> >> >>
> >> >> https://lists.apache.org/thread/749l74z1h5jylkxrw3rtjmxcj2t9p7ws
> >> >>
> >> >> https://lists.apache.org/thread/mcd7jcn4kz6oqtyqq5hfycjf9mqh6c53
> >> >>
> >> >>
> >> >> Best Regards
> >> >> Peter Huang
> >> >>
> >> >
> 

Reply via email to