I've got some consumers under decent GC pressure and, as a result, they are
having ZK sessions expire and the consumers never recover. I see a number
of rebalance failures in the log after the ZK session expiration followed
by silence (and consumed partitions).

My hypothesis is that, since the GC pause is global to the JVM, I'll have
multiple ConsumerConnectors get expired at the same time and have
synchronized rebalance/backoff cycles. Since rebalance fails if new
consumers join mid balance, the multiple expired connectors will always
collide with each other while attempting to rebalance.

Is this hypothesis crazy? If not, is there a more likely situation? If the
hypothesis isn't crazy, how might I avoid this when the JVM is under GC
pressure?

Thanks in advance.

Reply via email to