There are some fixes in 0.8.2-beta for periodic latency spikes if you are using acks=-1 in the producer.
-Jay On Tue, Nov 11, 2014 at 10:50 AM, Wes Chow <w...@chartbeat.com> wrote: > > We're seeing periodic spikes in req/sec rates across our nodes. Our > cluster is 10 nodes, and the topic has a replication factor of 3. We push > around 200k messages / sec into Kafka. > > > The machines are running the most recent version of Kafka and we're > connecting via librdkafka. pingstream02-10 are using the CMS garbage > collector, but I switched pingstream01 to use G1GC under the theory that > maybe these were GC pauses. The graph shows that likely didn't improve the > situation. > > My next thought is that maybe this is the effect of log rolling. Checking > in the logs, I see a lot of this: > > [2014-11-11 13:46:45,836] 72952071 [ReplicaFetcherThread-0-7] INFO > kafka.log.Log - Rolled new log segment for 'pings-342' in 3 ms. > [2014-11-11 13:46:47,116] 72953351 [kafka-request-handler-0] INFO > kafka.log.Log - Rolled new log segment for 'pings-186' in 2 ms. > [2014-11-11 13:46:48,155] 72954390 [ReplicaFetcherThread-0-8] INFO > kafka.log.Log - Rolled new log segment for 'pings-253' in 3 ms. > [2014-11-11 13:46:48,408] 72954643 [ReplicaFetcherThread-0-4] INFO > kafka.log.Log - Rolled new log segment for 'pings-209' in 3 ms. > [2014-11-11 13:46:48,436] 72954671 [ReplicaFetcherThread-0-4] INFO > kafka.log.Log - Rolled new log segment for 'pings-299' in 2 ms. > [2014-11-11 13:46:48,687] 72954922 [kafka-request-handler-0] INFO > kafka.log.Log - Rolled new log segment for 'pings-506' in 2 ms. > > The "pings" topic in question has 512 partitions, so it does this 512 > times every so often. We have an effective retention period of a bit less > than 30 min, so rolling happens pretty frequently. Still, if I assume worst > case that rolling locks up the process for 2ms and there are 512 rolls > every few minutes, I'd expect halting to happen for about a second at a > time. The graphs seem to indicate much longer dips, but it's hard for me to > know if I'm looking at real data or some sort of artifact. > > Fwiw, the producers are not reporting any errors, so it does not seem like > we're losing data. > > I'm new to Kafka. Should I be worried? If so, how should I be debugging > this? > > Thanks, > Wes > >