I've got some consumers under decent GC pressure and, as a result, they are having ZK sessions expire and the consumers never recover. I see a number of rebalance failures in the log after the ZK session expiration followed by silence (and consumed partitions).
My hypothesis is that, since the GC pause is global to the JVM, I'll have multiple ConsumerConnectors get expired at the same time and have synchronized rebalance/backoff cycles. Since rebalance fails if new consumers join mid balance, the multiple expired connectors will always collide with each other while attempting to rebalance. Is this hypothesis crazy? If not, is there a more likely situation? If the hypothesis isn't crazy, how might I avoid this when the JVM is under GC pressure? Thanks in advance.