Uneven GC behavior between nodes

Cees de Groot Fri, 04 Mar 2016 09:16:26 -0800

We're seeing something funny in one of our production clusters that I
cannot explain away. Everything works fine, but as we're ramping up on
Kafka, I really want to get at the root cause before we push a ton of
traffic through it :)


We have 6 nodes over three DCs in the cluster. Currently it's running a
light load of two topics, one with small (KB) messages, one with variable
sized (KB-MB) messages, both with 64 partitions and 3 replicas. All topics,
including __consumer_offsets, have been rebalanced with a script we wrote
to make sure that the replicas are spread out over the three datacenters
and that leadership is evenly balanced, so we can continue to operate if we
lose one DC. Producers use Consul to find an initial broker (round-robin
through the local DC), Consumers use the 0.9.0.1 client.

The funny thing is that in each DC, one broker graphs "normal" JVM heap
behavior - a sawtooth of the expected garbage creation/collection cycle.
The other one essentially stays flat. The flat-lining brokers also show
less incoming traffic when graphing the OS' received bytes. Everything else
- incoming, outgoing messages, etcetera, shows up as essentially the same
on the graphs.

I've been digging around for a bit, but can't find anything obvious that
would cause the differences in memory pressure. Assuming that Kafka brokers
pre-allocate buffers, I'd expect not too much garbage being generated. Is
the flatline the expected behavior and the sawtooth the unexpected one?
What could cause the difference?

Thanks for any pointers :-)


-- 

*Cees de Groot*
PRINCIPAL SOFTWARE ENGINEER
[image: PagerDuty logo] <http://pagerduty.com/>
pagerduty.com
c...@pagerduty.com <m...@pagerduty.com>
+1(416)435-4085

[image: Twitter] <http://twitter.com/pagerduty>[image: FaceBook]
<https://www.facebook.com/PagerDuty>[image: Google+]
<https://plus.google.com/114339089137644062989>[image: LinkedIn]
<https://www.linkedin.com/company/pagerduty>[image: Blog]
<https://blog.pagerduty.com/>

Uneven GC behavior between nodes

Reply via email to