Hi, I am building Kafka cluster and run producer perf test to get Kafka latency performance. >From test result, I notice that the long tail latency is very high and increased with time passing by although the 99.9% result looks very good. The worst latency can reach more than 1 second. Besides, disk utilization is always very low, never more than 1%. I also try to tune log.flush.interval.ms from 1000ms to 200ms. It does not help much.
Below is the max latency chart, Y axis represents the max latency in millisecond, X axis represents the time elapsed in milliseconds. From chart, we can see the latency increasing from about 10ms to 1095ms gradually. [image: Inline image] Kafka cluster is built up with 4 hosts. The version is 2.9.2-0.8.2-beta. The PerfTopic15 topic is created with 3 partition and 3 replication. Here is my perf script usage: -bash-4.1$ bin/kafka-producer-perf-test.sh --broker-list <broker list> --topics *PerfTopic15* --sync --initial-message-id 1 --messages 200000 --csv-reporter-enabled --metrics-dir /tmp/PerfTopic15_1 --message-send-gap-ms 20* --request-num-acks -1* --batch-size 1 -bash-4.1$ bin/kafka-topics.sh --zookeeper <zkHost>:2181 --describe --topic *PerfTopic15* Topic:PerfTopic15 PartitionCount:3 ReplicationFactor:3 Configs: Topic: PerfTopic15 Partition: 0 Leader: 3 Replicas: 3,4,1 Isr: 3,4,1 Topic: PerfTopic15 Partition: 1 Leader: 4 Replicas: 4,1,2 Isr: 4,1,2 Topic: PerfTopic15 Partition: 2 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 I expect the worst latency not exceed 100 milliseconds. But the test result is very discouraging. Do you have some points about Kafka long tail latency issue? Hope for your reply! Thanks in advance!