Hi, I've just created a Jira ticket to summarize the results of my analysis and propose a mitigation to the latency spikes: https://issues.apache.org/jira/browse/KAFKA-9693 Please have a look at the ticket. Do you see any important implication/risk in doing this change?
Thanks, Paolo On Tue, 18 Feb 2020 at 14:42, Paolo Moriello <paolomoriell...@gmail.com> wrote: > Hello, > > > I'm performing an investigation on Kafka latency. During my analysis I was > able to reproduce a scenario in which Kafka latency repeatedly spikes at > constant frequency, for small amounts of time. > > In my tests, in particular, latency could spike every ~2 minutes > (dependently on the throughput and input...) from an avg of ~3ms up to a > max of +500ms (p95-p99). > > See image: https://imagizer.imageshack.com/img922/5308/glhkO4.png > > > Further investigations showed that this is most likely caused by log > segments being rolled over. > > > Did anybody ever noticed anything like that? Do you know if it is possible > to tune p99 performance in order to reduce/eliminate the latency spikes? > > > Thanks, > > Paolo > > > Test configuration: > > - 15 brokers > - 6 producers, ack=1, no compression > - 1 topic, 90 partitions > - Kafka 2.2.1 > >