Correct - heavy client GC leads to numerous problems. There's two things you can do:
1) Tune the client JVM better to get GC to a more reasonable level 2) Increase the zookeeper session timeout value (this is generally a work-around for #1, but it can buy you time to dig into it) -- Dave DeMaagd | S'aite Reliability Engineering, Y'all ddema...@linkedin.com | 818 262 7958 (cl...@breyman.com - Mon, Apr 14, 2014 at 12:41:15PM -0700) > I've got some consumers under decent GC pressure and, as a result, they are > having ZK sessions expire and the consumers never recover. I see a number > of rebalance failures in the log after the ZK session expiration followed > by silence (and consumed partitions). > > My hypothesis is that, since the GC pause is global to the JVM, I'll have > multiple ConsumerConnectors get expired at the same time and have > synchronized rebalance/backoff cycles. Since rebalance fails if new > consumers join mid balance, the multiple expired connectors will always > collide with each other while attempting to rebalance. > > Is this hypothesis crazy? If not, is there a more likely situation? If the > hypothesis isn't crazy, how might I avoid this when the JVM is under GC > pressure? > > Thanks in advance.