[ https://issues.apache.org/jira/browse/KAFKA-13367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Heinze updated KAFKA-13367: ---------------------------------- Attachment: KafkaChaosTests.png > Performance Degradation during introducing Network Delay > -------------------------------------------------------- > > Key: KAFKA-13367 > URL: https://issues.apache.org/jira/browse/KAFKA-13367 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.5.1 > Environment: We are running Kafka 2.5 on m4.xlarge VMs on AWS. > Reporter: Thomas Heinze > Priority: Major > Attachments: KafkaChaosTests.png > > > Hi Kafka community, > > we are running a few chaos experiments to simulate Kafka's behaviour during > issues in the data center. To simulate a slow network we run the following > command on two out of six brokers (the brokers are spread across 3 AZs on > AWS, we run the command on two brokers in the same AZ): > {code:java} > tc qdisc add dev eth0 root netem delay x ms > {code} > > At the same time we are running some Kafka producers inserting roughly 4k > messages per second to a Kafka topic with 10 partitions with 3 replicas and > using min-isr=2. What we observe is the following: > * *Introducing a 1000 ms delay*: The producer see significant response time > delays, the throughput drops to 2k per second > * *Introducing a 2000 ms delay*: The producer delay increases further, the > throughput drops to 300 messages per second > * *Introducing a 5000 ms delay*: The Kafka clusters remove the slow brokers > from the list of active replicas and the incoming messages for the remaining > brokers increases. This is the expected behaviour imho. > What parameters would influence this behaviour? How can I make sure Kafka > shows the behaviour like for 5 seconds even for smaller delays? We would like > to make sure that we can guarantee around a certain throughput, even if one > AZ is very slow. > I already tried to set "replica.lag.time.max.ms" to very small values, but I > only observe that Kafka adds and remove the replicas on the slow nodes > constantly from the set of ISR. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)