The most severe issue I've run into is a poorly timed GC pause can actually lead to a situation where rebalancing leaves a partition completely un-owned. It's important to make sure that rebalance.max.retries * rebalance.backoff.ms is longer than any GC pause that your consumers experience.
A more common, but generally less severe, pain point is that rebalancing can move partitions around quite a bit because of how the simple, alphabetical dealing out of partitions works. This can result in lots of cache misses in your consumers as they suddenly start seeing a totally different set of keys; or, more generally, cause issues with other state in your consumer tier. -kevin On Thu, Nov 5, 2015 at 10:15 AM, Prabhjot Bharaj <prabhbha...@gmail.com> wrote: > Hello Folks, > > I am evaluating some failure scenarios during consumer rebalance in the > high-level consumer. > The idea of this test is to know what are the pain points from > operational/maintanence stand poitn that I need to consider when a consumer > rebalance takes place. > > Also, if there are any known issues that you are aware of or if you have > hit any issues whenever your high-level consumers in the same consumer > group trips (in case of failure), or scale-out in order to share the load, > request you to share your experiences. This will help me have proper > procedures in case some problem happens during consumer rebalance. > > Thanks, > Prabhjot >