[ https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418767#comment-15418767 ]
ASF GitHub Bot commented on FLINK-4389: --------------------------------------- GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/2363 [FLINK-4389] Expose metrics to WebFrontend This PR exposes metrics to the Webfrontend, as proposed in [FLIP-7](https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface). This PR builds on-top of #2300, meaning that 2866f56 is not part of the PR. I've split the implementation into 5 commits that implement * the generation of a separate scope string for the WebInterface * the MetricQueryService, a separate actor running on all Job-/TaskManagers whose main purpose is to create and return a dump of the metrics when queried to do so * the MetricStore, a nested data structure used in the WebInterface to store transmitted metrics * the MetricFetcher, which is used by the WebInterface to fetch metrics from Job-/TaskManagers * various MetricsHandler classes, which handle REST calls requesting specific metrics ### MetricQueryService The MetricQueryService is an actor running inside the MetricRegistry acting like an unscheduled reporter that is queried from the outside for a report. The MetricRegistry notifies it of added/removed metrics whereas the MetricFetcher sends report requests to the JM/TM which are then forwarded to the MetricQueryService, which answers directly to the MetricFetcher. The report is one big `Object[]`, which contains for each metric 1. the type of the metric, encoded as a byte (so that we know how many values are transmitted) 2. the fully qualified metric name (based on the separate format) 3. the value(s) of the metric (turned into Strings for Gauges) ### MetricStore The MetricStore is a relatively simple nested data-structure that contains one HashMap<String, Object> for every JM/TM/job/task. Received metrics are added to these HashMaps based on the format string. There is only a single MetricStore instance in the WebInterface. ### MetricFetcher The MetricFetcher initiates the transfer and cleanup of metrics. It contains the MetricStore instance, which is accessed by MetricHandlers. The fetching is only done when a handler asks for it, with a minimum duration of 10 seconds between updates. As such no fetching will be done if the metrics are not accessed with REST calls. The fetching procedure can be summed up in pseudo-code as following: ``` fetch(): askJobManagerForJobDetails() => retain all metrics belonging to the given jobs askJobManagerForMetrics() => add received metrics to MetricStore askJobManagerForRegisteredTaskManagers() => retain all metrics belonging to registered task managers => for each TaskManager: askTaskManagerForMetrics() => add received metrics to MetricStore ``` ### MetricsHandler The MetricsHandlers deal with two requests: * getAllAvailableMetrics - any REST request that does not have a `get` query parameter is treated as a request for all available metrics for a given JM/TM/job/task, denoted by the REST path. The reply will be a JSON array, for example: `[{"id":"metric_1"},{"id":"metric_2"}]` * getMetricValues - the Webfrontend can request the values for several metrics by passing a comma-separated list of metric id's as the `get` query parameter. The reply will be a JSON array of id:value pairs, for example: `[{"id":"metric_1", "value":"4"}]` or an empty string if an error occurred. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink 4389_metrics_exposed Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2363.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2363 ---- commit ea0e4d892717f042acf26ec9653a2371d7b21028 Author: zentol <ches...@apache.org> Date: 2016-07-27T09:25:27Z [FLINK-4245] Expose all defined variables commit ea1154644566f8009ccda64a0acbdde7d59ad235 Author: zentol <ches...@apache.org> Date: 2016-08-05T11:54:37Z Implement Query Scope Modifies various MetricGroups to return a separate scope for the query service. commit 3791a94529d703351dffb284ed3d5d19f1ce272c Author: zentol <ches...@apache.org> Date: 2016-08-05T11:49:10Z Implement MetricQueryService Used on the JM/TM to create a key-value representation of all metrics. commit a0e1418decc8a3a4b53da15dc744f1702247db9f Author: zentol <ches...@apache.org> Date: 2016-08-05T11:48:06Z Implement MetricStore Data structure used in the WebInterface to store the transmitted metrics. commit 2bab6cc32c139f5969a276e385ed5afd6c6a46ea Author: zentol <ches...@apache.org> Date: 2016-08-08T12:52:01Z Implement MetricFetcher The MetricFetcher regularly fetches metrics from the JM and all TM's. commit de4aeaf1e0958b49531adae198345b87ccd260bd Author: zentol <ches...@apache.org> Date: 2016-08-05T11:48:22Z Implement various MetricsHandler Handlers that answers metric related queries. ---- > Expose metrics to Webfrontend > ----------------------------- > > Key: FLINK-4389 > URL: https://issues.apache.org/jira/browse/FLINK-4389 > Project: Flink > Issue Type: Sub-task > Components: Metrics, Webfrontend > Affects Versions: 1.1.0 > Reporter: Chesnay Schepler > Assignee: Chesnay Schepler > Fix For: pre-apache > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface -- This message was sent by Atlassian JIRA (v6.3.4#6332)