Re: spikes in producer requests/sec

Magnus Edenhill Tue, 11 Nov 2014 12:50:28 -0800

Hi Wes,

are you monitoring librdkafka statistics as well?
If so, are there any correlating spikes in the per-broker and per-partition
statistics?
Such as:
 - brokers.<NAME>.rtt.avg  <--- broker round-trip-time (latency)
 - brokers.<NAME>.waitresp_cnt  <-- requests in flight
 - topics.<TOPIC>.partitions.<N>.msgq_cnt   <-- internal message queue
 - topics.<TOPIC>.partitions.<N>.xmitq_cnt   <-- transmit queue


Regards,
Magnus



2014-11-11 19:50 GMT+01:00 Wes Chow <w...@chartbeat.com>:

>
> We're seeing periodic spikes in req/sec rates across our nodes. Our
> cluster is 10 nodes, and the topic has a replication factor of 3. We push
> around 200k messages / sec into Kafka.
>
>
> The machines are running the most recent version of Kafka and we're
> connecting via librdkafka. pingstream02-10 are using the CMS garbage
> collector, but I switched pingstream01 to use G1GC under the theory that
> maybe these were GC pauses. The graph shows that likely didn't improve the
> situation.
>
> My next thought is that maybe this is the effect of log rolling. Checking
> in the logs, I see a lot of this:
>
> [2014-11-11 13:46:45,836] 72952071 [ReplicaFetcherThread-0-7] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-342' in 3 ms.
> [2014-11-11 13:46:47,116] 72953351 [kafka-request-handler-0] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-186' in 2 ms.
> [2014-11-11 13:46:48,155] 72954390 [ReplicaFetcherThread-0-8] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-253' in 3 ms.
> [2014-11-11 13:46:48,408] 72954643 [ReplicaFetcherThread-0-4] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-209' in 3 ms.
> [2014-11-11 13:46:48,436] 72954671 [ReplicaFetcherThread-0-4] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-299' in 2 ms.
> [2014-11-11 13:46:48,687] 72954922 [kafka-request-handler-0] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-506' in 2 ms.
>
> The "pings" topic in question has 512 partitions, so it does this 512
> times every so often. We have an effective retention period of a bit less
> than 30 min, so rolling happens pretty frequently. Still, if I assume worst
> case that rolling locks up the process for 2ms and there are 512 rolls
> every few minutes, I'd expect halting to happen for about a second at a
> time. The graphs seem to indicate much longer dips, but it's hard for me to
> know if I'm looking at real data or some sort of artifact.
>
> Fwiw, the producers are not reporting any errors, so it does not seem like
> we're losing data.
>
> I'm new to Kafka. Should I be worried? If so, how should I be debugging
> this?
>
> Thanks,
> Wes
>
>

Re: spikes in producer requests/sec

Reply via email to