Hello everyone!

We seem to be experiencing some odd behavior in Kafka and were wondering if 
anyone has come across the same issue and if you’ve been able to fix it.  
Here’s the setup:

8 brokers in the cluster.  Kafka 0.10.0.0.

One topic, and only one topic on this cluster, is having issues where ISRs 
continuously decrease and increase but never stabilize.  This happens after 
roughly 50,000 messages per second come in, and the problem is exacerbated when 
the messages increased to 110,000 messages per second.  Messages are small. 
Total inbound is only about 50 MB/s.

There’s no errors in the logs. We just get countless number of messages like 
theses in the logs:

[2016-09-09 12:54:07,147] INFO Partition [topic_a,11] on broker 4: Expanding 
ISR for partition [topic_a,11] from 4 to 4,2 (kafka.cluster.Partition)
[2016-09-09 12:54:23,070] INFO Partition [topic_a,11] on broker 4: Shrinking 
ISR for partition [topic_a,11] from 4,2 to 4 (kafka.cluster.Partition)

This topic has transient data that is unimportant after 20 minutes, so losing 
some due to a cluster shutdown isn’t that important, and we also don’t mind if 
messages are occasionally dropped.  With this in mind we have these settings:
Partitions = 16
Producer ACKs = 1
Replication factor = 2
min.insync.replicas = 1

CPU is sitting fairly idle at ~18%, and a thread dump and profile showed that 
most threads are sitting idle as well – very little contention if any.

We tried to increase the number of partitions from 16 to 24, but it seems to 
have only grown the CPU (from 18% to 23%) and the number of Under Replicated 
Partitions.

Any advice or insight is appreciated. Thank you all!

Lawrence

Reply via email to