[ 
https://issues.apache.org/jira/browse/FLINK-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587716#comment-16587716
 ] 

ASF GitHub Bot commented on FLINK-10150:
----------------------------------------

zentol opened a new pull request #6599: [FLINK-10150][metrics] Fix 
OperatorMetricGroup creation for Batch
URL: https://github.com/apache/flink/pull/6599
 
 
   ## What is the purpose of the change
   
   This PR fixes a severe issue in the metric system where chained batch 
operators would always operate on the same `OperatorMetricGroup`. As a result 
most Flink-provided metrics were not exposed for chained operators at all, 
while other metrics, like task-level IO metrics, were render incorrect.
   
   The problem is that we used the tasks `VertexID` to identify operators; 
which is obviously identical for all operators in a chain. We now use the 
vertexID and operator name to identify them.
   
   ## Brief change log
   
   * fix identification in `TaskMetricGroup` by using both the ID and operator 
name
   * extend `MockEnvironment[Builder]` to allow the `TaskMetricGroup` to be set
   
   ## Verifying this change
   
   This change added tests:
   * ChainedOperatorsMetricTest
   * run a basic wordcount as described in the JIRA and verify the results via 
the UI/reporter of your choice
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Inconsistent number of "Records received" / "Records sent"
> ----------------------------------------------------------
>
>                 Key: FLINK-10150
>                 URL: https://issues.apache.org/jira/browse/FLINK-10150
>             Project: Flink
>          Issue Type: Bug
>          Components: Metrics, Webfrontend
>    Affects Versions: 1.4.0, 1.5.0, 1.6.0, 1.7.0
>            Reporter: Helmut Zechmann
>            Assignee: Chesnay Schepler
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.4.3, 1.6.1, 1.7.0, 1.5.4
>
>         Attachments: record_counts_flink_1_3.png, record_counts_flink_1_4.png
>
>
> The flink web ui displays an inconsistent number of "Records received" / 
> "Records sent” in the job overview "Subtasks" view.
> When I run the example wordcount batch job with a small input file on flink 
> 1.3.2 I get
>  * 3 records sent by the first subtask and
>  * 3 records received by the second subtask
> This is the result I would expect.
>  
> If I run the same job on flink 1.4.0 / 1.5.2 / 1.6.0 I get
>  * 13 records sent by the first subtask and
>  * 3 records received by the second subtask
> In real life jobs the numbers are much more strange.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to