Sorry, had a typo in my gist.  Here is the correct location:

https://gist.github.com/anduill/710bb0619a80019016ac85bb5c060440

On 10/19/16, 4:33 PM, "David Garcia" <dav...@spiceworks.com> wrote:

    Hello everyone.  I’m having a hell of a time figuring out a Kafka 
performance issue in AWS. Any help is greatly appreciated!
    
    Here is our AWS configuration:
    
    
    -          Zookeeper Cluster (3.4.6): 3-nodes on m4.xlarges (default 
configuration) EBS volumes (sd1)
    
    -          Kafka Cluster (0.10.0): 3 nodes on m4.2xlarges (config: 
https://gist.github.com/anduill/710bb0619a80019016ac85bb5c060440 EBS volumes 
(sd1)
    
    Usage:
    
    Our usage of the cluster is fairly modest (at least I think so). At peak 
hours, each broker will receive about 1.4 MB/sec. Our primary input topic has 
about 54 partitions with replication set to 3 (ack=all).  Another consumer 
consumes this topic and spreads the messages across 8 other topics each with 8 
partitions…each of which has replication set to 2 (ack=all).  Downstream, 4 
other consumers consume these topics(one consumer consumes the 8 previous 
topics, transforms the messages, and sends the new messages to 8 other 
topics(ack=1) .  In all we end up generating about 206 partitions with an 
average replication of 2.26.
    
    Our Problem:
    
    Our cluster will hum-along just fine when suddenly, 1 or more brokers will 
start experiencing severe ISR-shrinking/expanding.  This causes underreplicated 
partitions and the producer purgatory size starts to rapidly expand(on the 
affected brokers)…this causes downstream producers to get behind in some cases.
    
    In the Kafka configuration above, we have a couple of non-default settings, 
but nothing seems to stand out.  Is there anything obvious I’m missing (or need 
to add/adjust)?  Or is there a bug I should be aware of that would cause these 
issues.
    
    -David
    

Reply via email to