[ 
https://issues.apache.org/jira/browse/FLINK-31827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-31827:
-----------------------------------
    Labels: pull-request-available stale-assigned  (was: pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issue is assigned but has not 
received an update in 30 days, so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a 
comment updating the community on your progress.  If this issue is waiting on 
feedback, please consider this a reminder to the committer/reviewer. Flink is a 
very active project, and so we appreciate your patience.
If you are no longer working on the issue, please unassign yourself so someone 
else may work on it.


> Incorrect estimation of the target data rate of a vertex when only a subset 
> of its upstream vertex's output is consumed
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-31827
>                 URL: https://issues.apache.org/jira/browse/FLINK-31827
>             Project: Flink
>          Issue Type: Bug
>          Components: Autoscaler
>            Reporter: Zhanghao Chen
>            Assignee: Gyula Fora
>            Priority: Major
>              Labels: pull-request-available, stale-assigned
>         Attachments: image-2023-04-17-23-37-35-280.png
>
>
> Currently, the target data rate of a vertex = SUM(target data rate * 
> input/output ratio) for all of its upstream vertices. This assumes that all 
> output records of an upstream vertex is consumed by the downstream vertex. 
> However, it does not always hold. Consider the following job plan generated 
> by a Flink SQL job. The middle vertex contains multiple chained Calc(select 
> xx) operators, each connecting to a separate downstream sink tasks. As a 
> result, each sink task only consumes a sub-portion of the middle vertex's 
> output.
> To fix it, we need operator level edge info to infer the upstream-downstream 
> relationship as well as operator level output metrics. The metrics part is 
> easy but AFAIK, there's no way to get the operator level edge info from the 
> Flink REST API yet.
> !image-2023-04-17-23-37-35-280.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to