[ https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158023#comment-15158023 ]
Dongwon Kim edited comment on FLINK-1502 at 2/23/16 4:44 AM: ------------------------------------------------------------- Let's consider the following scenario: ||Sessions||Node1||Node2||Node3|| |Session1|TM1|TM2|TM3| |Session2|TM2|TM3|TM1| |Session3|TM3|TM2|TM1| After Session1 is finished, Ganglia adds the following metric to the list of metrics for Node1: - Flink.taskmanager.1.gc_time After Session2 is finished, Ganglia adds the following metric to the list of metrics for Node1: - Flink.taskmanager.2.gc_time After Session3 is finished, Ganglia adds the following metric to the list of metrics for Node1: - Flink.taskmanager.3.gc_time Around this time, Ganglia has three metrics for each node. The problem is getting worse if the user has to launch much more TaskManagers. For example, 500 TaskManagers over multiple sessions will end up with creating 500 metrics for each host. Wouldn't be better to assign indexes to TaskManagers scoped to each host? was (Author: eastcirclek): Let's consider the following scenario: ||Sessions||Node1||Node2||Node3|| |Session1|TM1|TM2|TM3| |Session2|TM2|TM3|TM1| |Session3|TM3|TM2|TM1| After Session1 is finished, Node1 has the following metrics: - cluster.MyCluster.taskmanager.1.gc_time After Session2 is finished, Node1 has the following metrics: - cluster.MyCluster.taskmanager.1.gc_time - cluster.MyCluster.taskmanager.2.gc_time After Session3 is finished, Node1 has the following metrics: - cluster.MyCluster.taskmanager.1.gc_time - cluster.MyCluster.taskmanager.2.gc_time - cluster.MyCluster.taskmanager.3.gc_time Around this time, a user should check which metric is the one for the current session among the above three metrics. The problem is getting worse if the user has to launch much more TaskManagers. For example, 500 TaskManagers over multiple sessions will end up with creating 500 metrics for each host. Wouldn't be better to assign indexes to TaskManagers scoped to each host? p.s. I'm going to start without considering multiple TaskManagers on the same node as we haven't yet reached a consensus. But I think we still need to develop this discussion further. > Expose metrics to graphite, ganglia and JMX. > -------------------------------------------- > > Key: FLINK-1502 > URL: https://issues.apache.org/jira/browse/FLINK-1502 > Project: Flink > Issue Type: Sub-task > Components: JobManager, TaskManager > Affects Versions: 0.9 > Reporter: Robert Metzger > Assignee: Dongwon Kim > Priority: Minor > Fix For: pre-apache > > > The metrics library allows to expose collected metrics easily to other > systems such as graphite, ganglia or Java's JVM (VisualVM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)