Hello everyone.  I’m having a hell of a time figuring out a Kafka performance 
issue in AWS. Any help is greatly appreciated!

Here is our AWS configuration:


-          Zookeeper Cluster (3.4.6): 3-nodes on m4.xlarges (default 
configuration) EBS volumes (sd1)

-          Kafka Cluster (0.10.0): 3 nodes on m4.2xlarges (config: 
https://gist.github.com/anduill/710bb0619a80019016ac85bb5c060440) EBS volumes 
(sd1)

Usage:

Our usage of the cluster is fairly modest (at least I think so). At peak hours, 
each broker will receive about 1.4 MB/sec. Our primary input topic has about 54 
partitions with replication set to 3 (ack=all).  Another consumer consumes this 
topic and spreads the messages across 8 other topics each with 8 
partitions…each of which has replication set to 2 (ack=all).  Downstream, 4 
other consumers consume these topics(one consumer consumes the 8 previous 
topics, transforms the messages, and sends the new messages to 8 other 
topics(ack=1) .  In all we end up generating about 206 partitions with an 
average replication of 2.26.

Our Problem:

Our cluster will hum-along just fine when suddenly, 1 or more brokers will 
start experiencing severe ISR-shrinking/expanding.  This causes underreplicated 
partitions and the producer purgatory size starts to rapidly expand(on the 
affected brokers)…this causes downstream producers to get behind in some cases.

In the Kafka configuration above, we have a couple of non-default settings, but 
nothing seems to stand out.  Is there anything obvious I’m missing (or need to 
add/adjust)?  Or is there a bug I should be aware of that would cause these 
issues.

-David

Reply via email to