Siddharth Ahuja created KAFKA-13572:
---------------------------------------

             Summary: Negative value for 'Preferred Replica Imbalance' metric
                 Key: KAFKA-13572
                 URL: https://issues.apache.org/jira/browse/KAFKA-13572
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 2.7.0
            Reporter: Siddharth Ahuja


A negative value (-822) for the metric - 
{{kafka_controller_kafkacontroller_preferredreplicaimbalancecount}} has been 
observed - please see the attached screenshot and the output below:

{code:java}
$ curl -s http://localhost:9101/metrics | fgrep 
'kafka_controller_kafkacontroller_preferredreplicaimbalancecount'
# HELP kafka_controller_kafkacontroller_preferredreplicaimbalancecount 
Attribute exposed for management (kafka.controller<type=KafkaController, 
name=PreferredReplicaImbalanceCount><>Value)
# TYPE kafka_controller_kafkacontroller_preferredreplicaimbalancecount gauge
kafka_controller_kafkacontroller_preferredreplicaimbalancecount -822.0
{code}

The issue has appeared after an operation where the number of partitions for 
some topics were increased, and some topics were deleted/created in order to 
decrease the number of their partitions.

Ran the following command to check if there is/are any instance/s where the 
preferred leader (1st broker in the Replica list) is not the current Leader:
 
{code:java}
% grep ".*Topic:.*Partition:.*Leader:.*Replicas:.*Isr:.*Offline:.*" 
kafka-topics_describe.out | awk '{print $6 " " $8}' | cut -d "," -f1 | awk 
'{print $0, ($1==$2?_:"NOT") "MATCHED"}'|grep NOT | wc -l
     0
{code}

but could not find any such instances.

{{leader.imbalance.per.broker.percentage=2}} is set for all the brokers in the 
cluster which means that we are allowed to have an imbalance of up to 2% for 
preferred leaders. This seems to be a valid value, as such, this setting should 
not contribute towards a negative metric.

The metric seems to be getting subtracted in the code 
[here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerContext.scala#L474-L503]
 , however it is not clear when it can become -ve (i.e. subtracted more than 
added) in absence of any comments or debug/trace level logs in the code. 
However, one thing is for sure, you either have no imbalance (0) or have 
imbalance (> 0), it doesn’t make sense for the metric to be < 0. 

FWIW, no other anomalies besides this have been detected.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to