Github user zentol commented on the issue: https://github.com/apache/flink/pull/2146 I moved the checkpoint metrics into the Tracker (and reverted the changed to ExecutionGraph). Currently trying it out locally. Regarding the exception catching in the metrics: I can't decide whether we should try to write all metrics in such a way that they can't throw exceptions, or write the reporters in such a way that they can deal with it. (usually by logging the exception). The first option is safer considering custom reporters, but the second will allows us to properly log them. Regarding a test: While i agree that such a test would be nice i can only come up with icky ways to test it. You have to access the metric _while a job is running_ as they are removed afterwards. So you either have to submit a job that blocks until _something_ happens, or you add a reporter that feeds that information back to the test _somehow_.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---