Tom Crayford created KAFKA-4441:
-----------------------------------

             Summary: Kafka Monitoring is incorrect during rapid topic creation 
and deletion
                 Key: KAFKA-4441
                 URL: https://issues.apache.org/jira/browse/KAFKA-4441
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 0.10.0.1, 0.10.0.0
            Reporter: Tom Crayford
            Priority: Minor


Kafka reports several metrics off the state of partitions:
UnderReplicatedPartitions
PreferredReplicaImbalanceCount
OfflinePartitionsCount

All of these metrics trigger when rapidly creating and deleting topics in a 
tight loop, although the actual causes of the metrics firing are from topics 
that are undergoing creation/deletion, and the cluster is otherwise stable.

Looking through the source code, topic deletion goes through an asynchronous 
state machine: 
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35.


However, the metrics do not know about the progress of this state machine: 
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185
 


I believe the fix to this is relatively simple - we need to make the metrics 
know that a topic is currently undergoing deletion or creation, and only 
include topics that are "stable"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to