[ 
https://issues.apache.org/jira/browse/FLINK-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725372#comment-15725372
 ] 

ASF GitHub Bot commented on FLINK-5179:
---------------------------------------

Github user uce commented on the issue:

    https://github.com/apache/flink/pull/2886
  
    Tested this vs. the current master and it works as expected. In the current 
master, the metrics are not updated after the TM disassociates from the JM. 
With this PR the reporter is closed, restarted  and metrics are being reported 
again.
    
    In general, I'm wondering whether we should introduce an explicit suspended 
state where the metrics are still available and only reset after recovery. I 
was testing this with JMX and wondered whether it can be problematic for users 
if the reporter are down during JM recovery.


> MetricRegistry life-cycle issues with HA
> ----------------------------------------
>
>                 Key: FLINK-5179
>                 URL: https://issues.apache.org/jira/browse/FLINK-5179
>             Project: Flink
>          Issue Type: Bug
>          Components: Metrics
>    Affects Versions: 1.2.0
>            Reporter: Chesnay Schepler
>            Assignee: Chesnay Schepler
>            Priority: Blocker
>             Fix For: 1.2.0
>
>
> The TaskManager's MetricRegistry is started when the TaskManager is created, 
> and shutdown in the TaskManager's postStop method.
> However, the registry is also shutdown within the TaskManager's 
> disassociateFromJobManager method; however it is not restarted when the 
> connection is re-established.
> Effectively this means that a TaskManager that ever reconnected to a 
> JobManager will not report any metrics, since the reporters are shutdown as 
> well. Metrics will neither be sent to the WebInterface anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to