Re: [DISCUSS] FLIP-543: Support Customized Autoscale Algorithm

Rui Fan Thu, 28 Aug 2025 02:27:18 -0700

Hi everyone,

Thanks for the productive conversation on FLIP-543.


I agree that we need more extensibility in the autoscaler. The predictive
scaling
use case is a perfect example of a powerful feature that would help many of
us
improve job availability by scaling before backlogs build up.

To echo Gyula and Max's points, I also believe the best path forward is to
build
this capability as an extension to the existing framework, not as a
replacement.
This would offer a robust, community-driven solution for a common problem,
which feels more sustainable than asking users to implement and maintain
custom forks of the logic.

Best,
Rui

On Thu, Aug 28, 2025 at 7:14 AM Pradeepta Choudhury
<pchoudhur...@apple.com.invalid> wrote:

> Hello Peter,
>
> To start with, great initiative! But I echo the same concern raised about
> creating too many extension points can compromise the autoscaler
> functionality.
> When we proposed FLIP-514 [1] and a custom evaluator, the aim was twofold:
> provide the required extension point and ship practical strategies as
> pluggables. At the same time, we wanted to preserve flexibility for
> advanced, highly specific scenarios—like predictive scaling—that differ by
> ecosystem, platform, and company. The custom evaluator strikes that balance
> was the thought process: it lets users adjust the evaluated
> metrics—especially TARGET_DATA_RATE—that drive the scale-factor
> calculation, enabling useful out-of-the-box behavior without constraining
> bespoke implementations.
> One of the desired outcomes we had set for FLIP-514 was to ship a
> scheduled-scaling strategy as a pluggable, leveraging a baseline period and
> explicit scheduled windows to drive planned capacity changes. I’ve been
> away since last month due to personal commitments. I plan to resume after
> first week of September and will complete the scheduled-scaling plugin to
> wrap up the custom evaluator.
> Having the ScalingRealizer pluggable (
> https://github.com/apache/flink-kubernetes-operator/pull/1020/files),
> definitely sounds helpful for certain scenarios.
> But I totally agree with the general approach suggested by Gyula, about
> solving specific issues independently in the "best possible way" and then
> coming to a good solution regarding pluggability that could be foundation
> for future use-cases.
>
>
> Thanks and Regards
> Pradeepta
>
>
> > On 26 Aug 2025, at 6:05 PM, ctrlaltd...@icloud.com.invalid <
> ctrlaltd...@icloud.com.INVALID> wrote:
> >
> > From the ScalingRealizer, I think having before/after  hooks for
> `realizeParallelismOverrides` and `realizeConfigOverrides` would be good.
> We can support these hooks from plugins, thoughts?
> >
> >
> > Best,
> > Diljeet(DJ) Singh
> >
> > On 2025/08/26 08:24:33 Maximilian Michels wrote:
> >> Hi Peter,
> >>
> >> First of all, this is a great initiative. Flink Autoscaling definitely
> >> needs more points of extension. We recently added support for hooking
> >> into the metric evaluation (FLIP-514), but clearly that is just one
> >> extension point.
> >>
> >> That said, I think we will need to revise the approach a bit. I'm not
> >> sure, we should be replacing core components. As Gyula mentioned,
> >> replacing those will easily break the entire autoscaler. Instead, we
> >> should be adding extension points which allow for meaningful additions
> >> without breaking the scaling logic. There is already the option to
> >> replace the entire autoscaling module, if users really want to roll
> >> out a completely custom version.
> >>
> >> What usually works best is to formulate the use case first, then
> >> figure out what autoscaler customization would be necessary to
> >> implement the use case.
> >>
> >> As for making the ScalingRealizer pluggable
> >> (https://github.com/apache/flink-kubernetes-operator/pull/1020/files),
> >> I do think that makes sense for some scenarios.
> >>
> >> Cheers,
> >> Max
> >>
> >> On Tue, Aug 26, 2025 at 8:59 AM Gyula Fóra <gy...@gmail.com> wrote:
> >>>
> >>> Hi Peter & Diljeet!
> >>>
> >>> My general feedback is that we should try to introduce extension
> plugins instead of plugins that completely replace key parts of the
> autoscaler code.
> >>>
> >>> Let me give you a concrete example through FLIP-514 and FLIP-543 using
> the MetricsEvaluator pluggability.
> >>> The MetricsEvaluator in the autoscaler is responsible for
> evaluating/deriving/calculating metrics from the collected metrics. It has
> to calculate everything in a more or less specific way otherwise other
> parts of the autoscaler that depend on these metrics may not work. It
> doesn't seem very practical/resonable to completely reimplement this just
> because someone wants to extend the logic, this is extremely error prone
> and fragile especially if the autoscaler logic later evolves.
> >>>
> >>> FLIP-514 takes the approach to extend the metric evaluator with a new
> method that allows users to at the end modify the evaluated metrics and
> define custom ones. This is the right approach here as it makes a new
> extension very simple to build and maintain without interfering with
> existing logic.
> >>>
> >>> The approach in FLIP-543 and in Diljeet's example PR takes the
> replacement approach to completely substitute the entire parts of the
> implementation (the entire evaluator, scaling realizer etc). I think this
> is not very good for either the community or the actual user. From a
> community perspective it makes it harder to extend the logic with nice
> small additions and from a user's perspective it is very error probe if the
> operator autoscaler logic changes as it basically exposes a lot of internal
> logic on a user interface.
> >>>
> >>> So at this point,  -1 for the approach in FLIP-543 from my side, but I
> would love to hear the opinion of others as well.
> >>>
> >>> Cheers
> >>> Gyula
> >>>
> >>> On Mon, Aug 25, 2025 at 11:44 PM Peter Huang <hu...@gmail.com> wrote:
> >>>>
> >>>> Hi Diljeet,
> >>>>
> >>>> Yes, I think we have similar requirements to make autoscaler even more
> >>>> powerful to handle some customized requirements.
> >>>> The quick PoC makes sense to me. Let's get some more feedback from the
> >>>> community.
> >>>>
> >>>>
> >>>>
> >>>> Best Regards
> >>>> Peter Huang
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Aug 25, 2025 at 2:37 PM Peter Huang <hu...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Just try to combine the discussion into one thread.
> >>>>>
> >>>>> @Diljeet Singh
> >>>>> Posted a quick PoC for the proposal
> >>>>> https://github.com/apache/flink-kubernetes-operator/pull/1020.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Aug 25, 2025 at 7:52 AM Peter Huang <hu...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Community,
> >>>>>>
> >>>>>> Our org has been heavily using the Flink autoscaling algorithm. It
> >>>>>> greatly reduced our operation overhead and improved cost efficiency
> >>>>>> as users always over provision resources when onboard. Recently, we
> have
> >>>>>> had some requirements to customize the auto scaling algorithm
> >>>>>> for different scenarios, for example, during the holiday season
> large but
> >>>>>> predictable traffic spike, increase checkpoint interval together
> with
> >>>>>> scale up for streaming ingestion use cases.
> >>>>>>
> >>>>>> We search through the discussion about the topic in the mail list
> >>>>>> including the existing FLIP-514
> >>>>>> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-514%3A+Custom+Evaluator+plugin+for+Flink+Autoscaler
> >.
> >>>>>> Looks like the discussion is not finalized yet.
> >>>>>> To accelerate the process, we adopt and combine the
> >>>>>> existing opinions from the community and create a proposal in
> FLIP-543
> >>>>>> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-543%3A+Support+Customized+Autoscale+Algorithm
> >.
> >>>>>> The basic idea
> >>>>>> is to make some core components of autoscaler pluggable, for
> example,
> >>>>>> MetricsCollector, Metrics Evaluator, and ScalingRealizer, at the
> same
> >>>>>> keep the core logic skeleton (which is already well justified in
> large
> >>>>>> amount of users) of autoscaler untouched.
> >>>>>>
> >>>>>> Looking forward to any feedback and opinions on FLIP-543.
> >>>>>>
> >>>>>> [1]
> >>>>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-543%3A+Support+Customized+Autoscale+Algorithm
> >>>>>> [2]
> >>>>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-514%3A+Custom+Evaluator+plugin+for+Flink+Autoscaler
> >>>>>> [3] Other related discussion thread
> >>>>>>
> >>>>>> https://lists.apache.org/thread/749l74z1h5jylkxrw3rtjmxcj2t9p7ws
> >>>>>>
> >>>>>> https://lists.apache.org/thread/mcd7jcn4kz6oqtyqq5hfycjf9mqh6c53
> >>>>>>
> >>>>>>
> >>>>>> Best Regards
> >>>>>> Peter Huang
> >>>>>>
> >>>>>
>
>

Re: [DISCUSS] FLIP-543: Support Customized Autoscale Algorithm

Reply via email to