Yeah I was thinking of enforcing that restriction myself by taking the max too. Anyway, since the change is simple enough, I think that it makes sense to offer that (global scale factor) option, specially for coarse-grained resource management. We could create a ticket for that, what do you think Chen? Maybe you could just push your changes there? Otherwise I could send a proposal myself.
BTW A friend of mine (señor!) bumped Flink to 1.20 and he reported better task distribution. He will post an update on this soon... Regards, Salva On Fri, Aug 15, 2025 at 1:04 PM Zhanghao Chen <zhanghao.c...@outlook.com> wrote: > Not quite possible based on the current version. We run an internal > version of the Autoscaler in our production env. One major diff is that we > let the whole pipeline (except the source/sink) to have the same > parallelism to avoid uneven task distribution. The change is relatively > simple, just run the algorithm per vertex, and take the max of them. > > Best, > Zhanghao Chen > ------------------------------ > *From:* Salva Alcántara <salcantara...@gmail.com> > *Sent:* Thursday, August 14, 2025 12:24 > *To:* user <user@flink.apache.org> > *Subject:* Re: Autoscaling Global Scaling Factor (???) > > That was on my agenda already. Will try and let you know how it goes. > > Regarding my questions, do you think it's possible to achieve any of those > points to make the autoscaler work as when you simply add/remove replicas > by hand? > > Thanks Chen! > > Salva > > On Thu, Aug 14, 2025 at 2:58 AM Zhanghao Chen <zhanghao.c...@outlook.com> > wrote: > > Hi, you may upgrade Flink to 1.19.3 or 1.20.2 or 2.0.1+. There's a known > issue that Autoscaler may not minimize the number of TMs during downscaling > with adaptive scheduler [1]. > > [1] https://issues.apache.org/jira/browse/FLINK-33977 > > Best, > Zhanghao Chen > > ------------------------------ > *From:* Salva Alcántara <salcantara...@gmail.com> > *Sent:* Wednesday, August 13, 2025 20:56 > *To:* user <user@flink.apache.org> > *Subject:* RE: Autoscaling Global Scaling Factor (???) > > BTW, I'm running on Flink 1.18.1 on top of operator 1.12.1 and the > following autoscaler settings: > > ``` > job.autoscaler.enabled: "true" > job.autoscaler.scaling.enabled: "true" > job.autoscaler.scale-down.enabled: "true" > job.autoscaler.vertex.max-parallelism: "8" > job.autoscaler.vertex.min-parallelism: "1" > jobmanager.scheduler: adaptive > job.autoscaler.metrics.window: 15m > job.autoscaler.metrics.busy-time.aggregator: MAX > job.autoscaler.backlog-processing.lag-threshold: 2m > job.autoscaler.scaling.effectiveness.detection.enabled: "true" > job.autoscaler.scaling.effectiveness.threshold: "0.3" > job.autoscaler.scaling.event.interval: 10m > job.autoscaler.stabilization.interval: 5m > job.autoscaler.scale-up.max-factor: "100000.0" > job.autoscaler.scaling.key-group.partitions.adjust.mode: > "EVENLY_SPREAD" > job.autoscaler.scale-down.interval: 30m > job.autoscaler.scale-down.max-factor: "0.5" > job.autoscaler.memory.tuning.scale-down-compensation.enabled: "true" > job.autoscaler.catch-up.duration: 5m > job.autoscaler.restart.time: 15m > job.autoscaler.restart.time-tracking.enabled: "true" > job.autoscaler.utilization.target: "0.8" > ``` > > Regards, > > Salva > >