Zhanghao Chen created FLINK-31826: ------------------------------------- Summary: Incorrect estimation of the target data rate of a vertex when only a subset of its upstream vertex's output is consumed Key: FLINK-31826 URL: https://issues.apache.org/jira/browse/FLINK-31826 Project: Flink Issue Type: Improvement Components: Autoscaler Reporter: Zhanghao Chen Attachments: LHL7VKOG4B.jpg
Currently, a vertex's target data rate = the sum of its upstream vertex's target data rate * input/output ratio. This assumes that all of the upstream vertex output goes into the current vertex. However, it does not always hold. Consider the following job plan generated by a Flink SQL job. The vertex in the middle has multiple Calc(select xx) operators chained, each connects to a separate downstream tasks. The total num_rec_out_rate of the middle vertex = SUM num_rec_in_rate of its downstream tasks. To fix this problem, we need operator level output metrics and edge info. The operator level metrics part is easy, but AFAIK, there's no way to get the operator level edge info from the current Flink REST APIs. !LHL7VKOG4B.jpg! -- This message was sent by Atlassian Jira (v8.20.10#820010)