Zhanghao Chen created FLINK-31827:
-------------------------------------
Summary: Incorrect estimation of the target data rate of a vertex
when only a subset of its upstream vertex's output is consumed
Key: FLINK-31827
URL: https://issues.apache.org/jira/browse/FLINK-31827
Project: Flink
Issue Type: Bug
Components: Autoscaler
Reporter: Zhanghao Chen
Attachments: image-2023-04-17-23-37-35-280.png
Currently, the target data rate of a vertex = SUM(target data rate *
input/output ratio) for all of its upstream vertices. This assumes that all
output records of an upstream vertex is consumed by the downstream vertex.
However, it does not always hold. Consider the following job plan generated by
a Flink SQL job. The middle vertex contains multiple chained Calc(select xx)
operators, each connecting to a separate downstream sink tasks. As a result,
each sink task only consumes a sub-portion of the middle vertex's output.
To fix it, we need operator level edge info to infer the upstream-downstream
relationship as well as operator level output metrics. The metrics part is easy
but AFAIK, there's no way to get the operator level edge info from the Flink
REST API yet.
!image-2023-04-17-23-37-35-280.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)