Hi Xinyi, With ack = -1 and three replicas in ISR, the latency is bounded by the time spent on follower replica fetching from the leader most of the time, since the produce response cannot be acknowledged back until all ISR has fetched the data.
You can try to reduce "replica.fetch.wait.max.ms" and increase "num.replica.fetchers" in the broker configs: http://kafka.apache.org/documentation.html#brokerconfigs But note that this will increase the CPU / network usage. Guozhang On Tue, Feb 3, 2015 at 1:37 AM, Xinyi Su <xiny...@gmail.com> wrote: > Hi, > I am building Kafka cluster and run producer perf test to get Kafka latency > performance. > From test result, I notice that the long tail latency is very high and > increased with time passing by although the 99.9% result looks very good. > The worst latency can reach more than 1 second. Besides, disk utilization > is always very low, never more than 1%. I also try to tune > log.flush.interval.ms from 1000ms to 200ms. It does not help much. > > Below is the max latency chart, Y axis represents the max latency in > millisecond, X axis represents the time elapsed in milliseconds. From > chart, we can see the latency increasing from about 10ms to 1095ms > gradually. > > [image: Inline image] > > Kafka cluster is built up with 4 hosts. The version is 2.9.2-0.8.2-beta. > The PerfTopic15 topic is created with 3 partition and 3 replication. > > Here is my perf script usage: > -bash-4.1$ bin/kafka-producer-perf-test.sh --broker-list <broker > list> --topics *PerfTopic15* --sync --initial-message-id 1 --messages > 200000 --csv-reporter-enabled --metrics-dir /tmp/PerfTopic15_1 > --message-send-gap-ms 20* --request-num-acks -1* --batch-size 1 > > -bash-4.1$ bin/kafka-topics.sh --zookeeper <zkHost>:2181 --describe > --topic *PerfTopic15* > Topic:PerfTopic15 PartitionCount:3 ReplicationFactor:3 Configs: > Topic: PerfTopic15 Partition: 0 Leader: 3 Replicas: 3,4,1 Isr: 3,4,1 > Topic: PerfTopic15 Partition: 1 Leader: 4 Replicas: 4,1,2 Isr: 4,1,2 > Topic: PerfTopic15 Partition: 2 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 > > I expect the worst latency not exceed 100 milliseconds. But the test result > is very discouraging. Do you have some points about Kafka long tail latency > issue? > > Hope for your reply! Thanks in advance! > -- -- Guozhang