For some reason, I am not able to get the “under-replicated partitions” metric 
on my Kafka cluster to zero across all nodes. Even after I manually reassign 
all the partitions, one server still has 928 under-replicated partitions. Also, 
the number of partitions each server is leading is very uneven, it ranges from 
268 up to 2,098.

In server.log, I see many messages like this for various partitions:
DateTime=[2018-09-27 18:57:21,133] Type=WARN Message="[ReplicaManager 
broker=167927108] While recording the replica LEO, the partition 
funnel-metrics-3 hasn't been created." Class=(kafka.server.ReplicaManager)

Also, Kafka-reassign-partitions.sh “--verify” shows many occurrences of 
“Reassignment of partition such-and-such-0 failed” for various partitions.

Meanwhile, on clients trying to write messages into Kafka, I see messages like, 
“logger=org.apache.kafka.clients.NetworkClient, , message="Error while fetching 
metadata with correlation id 274 : 
{alpha-checkout-event=INVALID_REPLICATION_FACTOR}"”
And “logger=org.apache.kafka.clients.producer.internals.Sender, , message="Got 
error produce response with correlation id 580 on topic-partition 
usersignals-14, retrying (10 attempts left). Error: NOT_LEADER_FOR_PARTITION"”
And 
“logger=c.expedia.www.hendrix.generator.framework.kafka.KafkaConsumerRunnable, 
, message="Kafka producer asynchronous send Future failed. Topic: 
tnl-exposure-logs Partition: null"”

Does anyone have any idea what the problem is, or what can I do about it? 
Thanks!

Reply via email to