[ https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804755#comment-15804755 ]
ASF GitHub Bot commented on KAFKA-4441: --------------------------------------- GitHub user edoardocomar opened a pull request: https://github.com/apache/kafka/pull/2325 KAFKA-4441 Monitoring incorrect during topic creation and deletion OfflinePartitionsCount PreferredReplicaImbalanceCount metrics check for topic being deleted Added integration test which polls the metrics while topics are being created and deleted Developed with @mimaison You can merge this pull request into a Git repository by running: $ git pull https://github.com/edoardocomar/kafka KAFKA-4441 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2325.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2325 ---- commit a793d249e2653255eb62a0d1b9c4a2b99c917b11 Author: Mickael Maison <mickael.mai...@gmail.com> Date: 2016-12-15T13:26:51Z KAFKA-4441 Monitoring incorrect during topic creation and deletion OfflinePartitionsCount PreferredReplicaImbalanceCount metrics check for topic being deleted Added integration test which polls the metrics while topics are being created and deleted Developed with @mimaison ---- > Kafka Monitoring is incorrect during rapid topic creation and deletion > ---------------------------------------------------------------------- > > Key: KAFKA-4441 > URL: https://issues.apache.org/jira/browse/KAFKA-4441 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.10.0.0, 0.10.0.1 > Reporter: Tom Crayford > Assignee: Edoardo Comar > > Kafka reports several metrics off the state of partitions: > UnderReplicatedPartitions > PreferredReplicaImbalanceCount > OfflinePartitionsCount > All of these metrics trigger when rapidly creating and deleting topics in a > tight loop, although the actual causes of the metrics firing are from topics > that are undergoing creation/deletion, and the cluster is otherwise stable. > Looking through the source code, topic deletion goes through an asynchronous > state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35. > However, the metrics do not know about the progress of this state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185 > > I believe the fix to this is relatively simple - we need to make the metrics > know that a topic is currently undergoing deletion or creation, and only > include topics that are "stable" -- This message was sent by Atlassian JIRA (v6.3.4#6332)