Varun created FLINK-38538:
-----------------------------
Summary: Flink Autoscaler: Need Option to Autoscale Based
Primarily (or Solely) on Busy Metric, Not Output Ratio
Key: FLINK-38538
URL: https://issues.apache.org/jira/browse/FLINK-38538
Project: Flink
Issue Type: Technical Debt
Components: Autoscaler
Affects Versions: 1.19.0
Reporter: Varun
*Components:* Autoscaler, Flink Kubernetes Operator
*Affects Version:*
* Flink: 1.19.
* Flink Kubernetes Operator: 1.13.1
Deployed via FlinkDeployment CR on an on-premises Rancher-managed Flink
Kubernetes operator
*Environment:*
* Flink job launched as a FlinkDeployment on an on-premise Kubernetes cluster
managed by Rancher, in application mode.
Typical deployment config (excerpt):
yaml
{{job.autoscaler.enabled: "true"jobmanager.scheduler:
adaptivejob.autoscaler.stabilization.interval:
"1m"job.autoscaler.metrics.window: "2m"job.autoscaler.utilization.target:
"0.4"job.autoscaler.utilization.max: "0.6"job.autoscaler.utilization.min:
"0.2"}}
{{job.autoscaler.vertex.min-parallelism:
"10"job.autoscaler.vertex.max-parallelism: "200"}}
{{job.autoscaler.scale-up.grace-period:
"1m"job.autoscaler.metrics.busy-time.aggregator: "MAX"}}
*Problem Statement:*
In the current autoscaler implementation, scaling decisions are made using both
operator busy time ({*}busyTimeMsPerSecond{*}) and the *output ratio* (i.e.,
the ratio of output records to input records per edge), combined as part of the
recursive target data rate calculation.
However, we observe cases where an operator/sub-job remains *100% busy* across
all subtasks, yet is aggressively scaled _down,_ sometimes to bare minimum
parallelism, purely because the autoscaler's recursively computed output ratio
(or downstream demand) is low. This reflects scenarios with heavy filtering,
aggregations, or temporarily slow/blocked sinks.
There is {*}currently no way to configure the Flink autoscaler to prioritize,
or exclusively use, the busy metric for scaling decisions{*}, even if this
would be more appropriate for certain classes of workloads.
----
*Behavior Observed (with Log Snapshots):*
We provide a multi-stage pipeline where one "hot" vertex is highly loaded:
* *Vertex scaling metrics* (all subtasks busy):
{{Vertex scaling metrics for 20b73495bccfee0c65322e5852f3f496:
\{ACCUMULATED_BUSY_TIME=4021203.0, NUM_RECORDS_IN=47229.0, LOAD=1.0,
NUM_RECORDS_OUT=7834.0}}}
* *Autoscaler parallelism overrides:* (scaling up due to busy metric)
{{[DEBUG][flink/example-flink-pipeline] Applying parallelism overrides: \{ ...,
20b73495bccfee0c65322e5852f3f496=44, ... }}}
* *Output ratio computed near zero:*
{{Computed output ratio for edge (f134ca61df556898f135a2c691a13fc5 ->
20b73495bccfee0c65322e5852f3f496): 0.0}}
* *Target processing capacity forcibly goes to zero, so scaling down is
triggered:*
{{Vertex 20b73495bccfee0c65322e5852f3f496 processing rate 404.562 is outside
(0.0, 0.0)
[DEBUG] Applying parallelism overrides: \{...,
20b73495bccfee0c65322e5852f3f496=18, ...}
[DEBUG] Applying parallelism overrides: \{...,
20b73495bccfee0c65322e5852f3f496=10, ...}}}
* *Even at 10 parallelism, the stage remains 100% busy:*
{{[{ "id": }}
{{"busyTimeMsPerSecond", }}
{{"min": 1000, }}
{{"max": 1000, }}
{{"avg": 1000, }}
{{"sum": 10000 }]}}
----
*Interpretation:*
Even though the operator is maxed out, the autoscaler determines required
target rate = 0 due to output ratio/backpropagated demand, scaling down
parallelism to minimum and leaving vertices overloaded.
*Request:*
Can we have a configuration or scaling policy/mode to the Flink autoscaler that
allows scaling (up or down) {*}primarily or solely on busy metric{*}.
* Optionally, allow users to disable or strongly de-emphasize the use of
output ratio and strictly propagate scaling based on busyTime for selected
vertices or globally.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)