[ https://issues.apache.org/jira/browse/KAFKA-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803057#comment-13803057 ]
Joel Koshy commented on KAFKA-1100: ----------------------------------- That's a good point - we don't need it to be that way. The metric names that you referred to are derived from the consumer's registration in zookeeper. There are a couple of cleanup tasks we need to do for mbeans especially wrt consumers: - The names need not include timestamps. The reason we have timestamps and a hash in there is if you were to bring up two consumers under the same group on the same host at nearly the same time their registration would collide in zookeeper. Realistically this is something that only happens in system tests so it should be fine to drop the timestamp and hash for metrics registration. - Metrics are not de-registered on a rebalance/shutdown. I think there is already a jira for the shutdown case, but I'm compiling a list of other shortcomings and will file an umbrella jira to cover most of these issues. - I think the deregistration issues affect replica fetchers as well (need to check). i.e., if a broker transitions from a follower to leader for a partition the follower metrics for that partition need to be de-registered. > metrics shouldn't have generation/timestamp specific names > ---------------------------------------------------------- > > Key: KAFKA-1100 > URL: https://issues.apache.org/jira/browse/KAFKA-1100 > Project: Kafka > Issue Type: Bug > Reporter: Jason Rosenberg > > I've noticed that there are several metrics that seem useful for monitoring > overtime, but which contain generational timestamps in the metric name. > We are using yammer metrics libraries to send metrics data in a background > thread every 10 seconds (to kafka actually), and then they eventually end up > in a metrics database (graphite, opentsdb). The metrics then get graphed via > UI, and we can see metrics going way back, etc. > Unfortunately, many of the metrics coming from kafka seem to have metric > names that change any time the server or consumer is restarted, which makes > it hard to easily create graphs over long periods of time (spanning app > restarts). > For example: > names like: > kafka.consumer.FetchRequestAndResponseMetrics....square-1371718712833-e9bb4d10-0-508818741-AllBrokersFetchRequestRateAndTimeMs > or: > kafka.consumer.ZookeeperConsumerConnector...topicName.....square-1373476779391-78aa2e83-0-FetchQueueSize > In our staging environment, we have our servers on regular auto-deploy cycles > (they restart every few hours). So just not longitudinally usable to have > metric names constantly changing like this. > Is there something that can easily be done? Is it really necessary to have > so much cryptic info in the metric name? -- This message was sent by Atlassian JIRA (v6.1#6144)