[ 
https://issues.apache.org/jira/browse/FLINK-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176925#comment-17176925
 ] 

Piotr Nowojski edited comment on FLINK-18808 at 8/13/20, 11:05 AM:
-------------------------------------------------------------------

But isn't the whole point of the discussion? The version posted in your PR 
counts all of the outputs from 3rd level (including {{ChainingOutput}}), and 
this is not what we want to achieve.

Given the following operators in a single task:

{noformat}
         /-> SideOutput1
A -> B -> C -> NetworkOutput
         \-> SideOutput2
{noformat}
(sorry my diagrams are not as nice as yours ;) )

For the task level {{numRecordsOut}} (aka number of records produced by the 
task) we want to report the number of records that reached {{SideOutput1 + 
SideOutput2 + NetworkOutput}}. That's the number of records produced by this 
task. Currently we are ignoring side outputs of {{B}}, while your outstanding 
PR would incorrectly account for {{ChainingOutput B->C}}.

For the hypothetical {{numRecordsSent}} (would be nice to have, but not must 
have), we could report the number of records that were sent by  {{SideOutput1 + 
SideOutput2 + NetworkOutput}} (taking into account that broadcasting multiplies 
the number of records).

For the operator level {{numRecordsOut}} we currently have (that should stay 
unchanged):
* Operator A: number of records that reached {{ChainingOutput A->B}}
* Operator B: {{SideOutput1 + SideOutput2}} + number of records that reached 
{{ChainingOutput  B->C}}
* Operator C: number of records that reached {{NetworkInput}}


was (Author: pnowojski):
But isn't the whole point of the discussion? The version posted in your PR 
counts all of the outputs from 3rd level (including {{ChainingOutput}}), and 
this is not what we want to achieve.

Given the following operators in a single task:

{noformat}
         /-> SideOutput1
A -> B -> C -> NetworkOutput
         \-> SideOutput2
{noformat}
(sorry my diagrams are not as nice as yours ;) )

For the task level {{numRecordsOut}} (aka number of records produced by the 
task) we want to report the number of records that reached {{SideOutput1 + 
SideOutput2 + NetworkOutput}}. That's the number of records produced by this 
task. Currently we are ignoring side outputs of {{B}}.

For the hypothetical {{numRecordsSent}} (would be nice to have, but not must 
have), we could report the number of records that were sent by  {{SideOutput1 + 
SideOutput2 + NetworkOutput}} (taking into account that broadcasting multiplies 
the number of records).

For the operator level {{numRecordsOut}} we currently have (that should stay 
unchanged):
* Operator A: number of records that reached {{ChainingOutput A->B}}
* Operator B: {{SideOutput1 + SideOutput2}} + number of records that reached 
{{ChainingOutput  B->C}}
* Operator C: number of records that reached {{NetworkInput}}

> Task-level numRecordsOut metric may be underestimated
> -----------------------------------------------------
>
>                 Key: FLINK-18808
>                 URL: https://issues.apache.org/jira/browse/FLINK-18808
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics, Runtime / Task
>    Affects Versions: 1.11.1
>            Reporter: ming li
>            Assignee: ming li
>            Priority: Major
>              Labels: pull-request-available, usability
>         Attachments: image-2020-08-04-11-28-13-800.png, 
> image-2020-08-04-11-32-20-678.png, image-2020-08-13-18-36-13-282.png
>
>
> At present, we only register task-level numRecordsOut metric by reusing 
> operator output record counter at the end of OperatorChain.
> {code:java}
> if (config.isChainEnd()) {
>    operatorMetricGroup.getIOMetricGroup().reuseOutputMetricsForTask();
> }
> {code}
> If we only send data out through the last operator of OperatorChain, there is 
> no problem with this statistics. But consider the following scenario:
> !image-2020-08-04-11-28-13-800.png|width=507,height=174!
> In this JobGraph, we not only send data in the last operator, but also send 
> data in the middle operator of OperatorChain (the map operator just returns 
> the original value directly). Below is one of our test topology, we can see 
> that the statistics actually only have half of the total data received by the 
> downstream.
> !image-2020-08-04-11-32-20-678.png|width=648,height=251!
> I think the data sent out by the intermediate operator should also be counted 
> into the numRecordsOut of the Task. But currently we are not reusing 
> operators output record counters in the intermediate operators, which leads 
> to our task-level numRecordsOut metric is underestimated (although this has 
> no effect on the actual operation of the job, it may affect our monitoring).
> A simple idea of ​​mine is to modify the condition of reusing operators 
> output record counter:
> {code:java}
> if (!config.getNonChainedOutputs(getUserCodeClassloader()).isEmpty()) {
>    operatorMetricGroup.getIOMetricGroup().reuseOutputMetricsForTask();
> }{code}
> In addition, I have another question: If a record is broadcast to all 
> downstream, should the numRecordsOut counter increase by one or the 
> downstream number? It seems that currently we are adding one to calculate the 
> numRecordsOut metric.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to