[jira] [Updated] (KAFKA-13367) Performance Degradation during introducing Network Delay

Thomas Heinze (Jira) Tue, 12 Oct 2021 01:15:06 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-13367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thomas Heinze updated KAFKA-13367:
----------------------------------
    Attachment: KafkaChaosTests.png

> Performance Degradation during introducing Network Delay
> --------------------------------------------------------
>
>                 Key: KAFKA-13367
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13367
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.5.1
>         Environment: We are running Kafka 2.5 on m4.xlarge VMs on AWS.
>            Reporter: Thomas Heinze
>            Priority: Major
>         Attachments: KafkaChaosTests.png
>
>
> Hi Kafka community,
>  
> we are running a few chaos experiments to simulate Kafka's behaviour during 
> issues in the data center. To simulate a slow network we run the following 
> command on two out of six brokers (the brokers are spread across 3 AZs on 
> AWS, we run the command on two brokers in the same AZ):
> {code:java}
> tc qdisc add dev eth0 root netem delay x ms 
>  {code}
>  
>  At the same time we are running some Kafka producers inserting roughly 4k 
> messages per second to a Kafka topic with 10 partitions with 3 replicas and 
> using min-isr=2. What we observe is the following:
>  * *Introducing a 1000 ms delay*: The producer see significant response time 
> delays, the throughput drops to 2k per second
>  * *Introducing a 2000 ms delay*: The producer delay increases further, the 
> throughput drops to 300 messages per second
>  * *Introducing a 5000 ms delay*: The Kafka clusters remove the slow brokers 
> from the list of active replicas and the incoming messages for the remaining 
> brokers increases. This is the expected behaviour imho.
> What parameters would influence this behaviour? How can I make sure Kafka 
> shows the behaviour like for 5 seconds even for smaller delays? We would like 
> to make sure that we can guarantee around a certain throughput, even if one 
> AZ is very slow.
> I already tried to set "replica.lag.time.max.ms" to very small values, but I 
> only observe that Kafka adds and remove the replicas on the slow nodes 
> constantly from the set of ISR.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KAFKA-13367) Performance Degradation during introducing Network Delay

Reply via email to