[ https://issues.apache.org/jira/browse/FLINK-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643391#comment-16643391 ]
ASF GitHub Bot commented on FLINK-10135: ---------------------------------------- tillrohrmann commented on a change in pull request #6702: [FLINK-10135] The JobManager does not report the cluster-level metrics URL: https://github.com/apache/flink/pull/6702#discussion_r223690671 ########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java ########## @@ -734,6 +744,18 @@ public void requestHeartbeat(ResourceID resourceID, Void payload) { } } + private void registerSlotAndTaskExecutorMetrics() { + jobManagerMetricGroup.gauge( + TASK_SLOTS_AVAILABLE_METRIC_NAME, + () -> (long) slotManager.getNumberFreeSlots()); Review comment: I think it should be okish to call these methods from a different thread. All of them simply return the size of a `HashMap`. In the worst case, we would report a little bit stale data. The main danger I see is that the implementations of `SlotManager` might change, which makes this no longer hold true. If we wanted to make it "thread-safe", then we would need to make the access call inside of the `MainThreadExecutor`: ``` jobManagerMetricGroup.gauge( MetricNames.TASK_SLOTS_AVAILABLE, () -> callAsync(() -> slotManager.getNumberFreeSlots(), timeout).get()); ``` However, this might be an overkill at the moment. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > The JobManager doesn't report the cluster-level metrics > ------------------------------------------------------- > > Key: FLINK-10135 > URL: https://issues.apache.org/jira/browse/FLINK-10135 > Project: Flink > Issue Type: Bug > Components: JobManager, Metrics > Affects Versions: 1.5.0, 1.6.0, 1.7.0 > Reporter: Joey Echeverria > Assignee: vinoyang > Priority: Critical > Labels: pull-request-available > > In [the documentation for > metrics|https://ci.apache.org/projects/flink/flink-docs-release-1.5/monitoring/metrics.html#cluster] > in the Flink 1.5.0 release, it says that the following metrics are reported > by the JobManager: > {noformat} > numRegisteredTaskManagers > numRunningJobs > taskSlotsAvailable > taskSlotsTotal > {noformat} > In the job manager REST endpoint > ({{http://<job-manager>:8081/jobmanager/metrics}}), those metrics don't > appear. -- This message was sent by Atlassian JIRA (v7.6.3#76005)