Re: Autoscaling Global Scaling Factor (???)

Salva Alcántara Fri, 15 Aug 2025 12:23:44 -0700

Yeah I was thinking of enforcing that restriction myself by taking the max
too. Anyway, since the change is simple enough, I think that it makes sense
to offer that (global scale factor) option, specially for coarse-grained
resource management.
We could create a ticket for that, what do you think Chen? Maybe you could
just push your changes there? Otherwise I could send a proposal myself.


BTW A friend of mine (señor!) bumped Flink to 1.20 and he reported better
task distribution. He will post an update on this soon...

Regards,

Salva



On Fri, Aug 15, 2025 at 1:04 PM Zhanghao Chen <zhanghao.c...@outlook.com>
wrote:

> Not quite possible based on the current version. We run an internal
> version of the Autoscaler in our production env. One major diff is that we
> let the whole pipeline (except the source/sink) to have the same
> parallelism to avoid uneven task distribution. The change is relatively
> simple, just run the algorithm per vertex, and take the max of them.
>
> Best,
> Zhanghao Chen
> ------------------------------
> *From:* Salva Alcántara <salcantara...@gmail.com>
> *Sent:* Thursday, August 14, 2025 12:24
> *To:* user <user@flink.apache.org>
> *Subject:* Re: Autoscaling Global Scaling Factor (???)
>
> That was on my agenda already. Will try and let you know how it goes.
>
> Regarding my questions, do you think it's possible to achieve any of those
> points to make the autoscaler work as when you simply add/remove replicas
> by hand?
>
> Thanks Chen!
>
> Salva
>
> On Thu, Aug 14, 2025 at 2:58 AM Zhanghao Chen <zhanghao.c...@outlook.com>
> wrote:
>
> Hi, you may upgrade Flink to 1.19.3 or 1.20.2 or 2.0.1+. There's a known
> issue that Autoscaler may not minimize the number of TMs during downscaling
> with adaptive scheduler [1].
>
> [1] https://issues.apache.org/jira/browse/FLINK-33977
>
> Best,
> Zhanghao Chen
>
> ------------------------------
> *From:* Salva Alcántara <salcantara...@gmail.com>
> *Sent:* Wednesday, August 13, 2025 20:56
> *To:* user <user@flink.apache.org>
> *Subject:* RE: Autoscaling Global Scaling Factor (???)
>
> BTW, I'm running on Flink 1.18.1 on top of operator 1.12.1 and the
> following autoscaler settings:
>
> ```
>       job.autoscaler.enabled: "true"
>       job.autoscaler.scaling.enabled: "true"
>       job.autoscaler.scale-down.enabled: "true"
>       job.autoscaler.vertex.max-parallelism: "8"
>       job.autoscaler.vertex.min-parallelism: "1"
>       jobmanager.scheduler: adaptive
>       job.autoscaler.metrics.window: 15m
>       job.autoscaler.metrics.busy-time.aggregator: MAX
>       job.autoscaler.backlog-processing.lag-threshold: 2m
>       job.autoscaler.scaling.effectiveness.detection.enabled: "true"
>       job.autoscaler.scaling.effectiveness.threshold: "0.3"
>       job.autoscaler.scaling.event.interval: 10m
>       job.autoscaler.stabilization.interval: 5m
>       job.autoscaler.scale-up.max-factor: "100000.0"
>       job.autoscaler.scaling.key-group.partitions.adjust.mode:
> "EVENLY_SPREAD"
>       job.autoscaler.scale-down.interval: 30m
>       job.autoscaler.scale-down.max-factor: "0.5"
>       job.autoscaler.memory.tuning.scale-down-compensation.enabled: "true"
>       job.autoscaler.catch-up.duration: 5m
>       job.autoscaler.restart.time: 15m
>       job.autoscaler.restart.time-tracking.enabled: "true"
>       job.autoscaler.utilization.target: "0.8"
> ```
>
> Regards,
>
> Salva
>
>

Re: Autoscaling Global Scaling Factor (???)

Reply via email to