We're seeing something funny in one of our production clusters that I cannot explain away. Everything works fine, but as we're ramping up on Kafka, I really want to get at the root cause before we push a ton of traffic through it :)
We have 6 nodes over three DCs in the cluster. Currently it's running a light load of two topics, one with small (KB) messages, one with variable sized (KB-MB) messages, both with 64 partitions and 3 replicas. All topics, including __consumer_offsets, have been rebalanced with a script we wrote to make sure that the replicas are spread out over the three datacenters and that leadership is evenly balanced, so we can continue to operate if we lose one DC. Producers use Consul to find an initial broker (round-robin through the local DC), Consumers use the 0.9.0.1 client. The funny thing is that in each DC, one broker graphs "normal" JVM heap behavior - a sawtooth of the expected garbage creation/collection cycle. The other one essentially stays flat. The flat-lining brokers also show less incoming traffic when graphing the OS' received bytes. Everything else - incoming, outgoing messages, etcetera, shows up as essentially the same on the graphs. I've been digging around for a bit, but can't find anything obvious that would cause the differences in memory pressure. Assuming that Kafka brokers pre-allocate buffers, I'd expect not too much garbage being generated. Is the flatline the expected behavior and the sawtooth the unexpected one? What could cause the difference? Thanks for any pointers :-) -- *Cees de Groot* PRINCIPAL SOFTWARE ENGINEER [image: PagerDuty logo] <http://pagerduty.com/> pagerduty.com c...@pagerduty.com <m...@pagerduty.com> +1(416)435-4085 [image: Twitter] <http://twitter.com/pagerduty>[image: FaceBook] <https://www.facebook.com/PagerDuty>[image: Google+] <https://plus.google.com/114339089137644062989>[image: LinkedIn] <https://www.linkedin.com/company/pagerduty>[image: Blog] <https://blog.pagerduty.com/>