The most severe issue I've run into is a poorly timed GC pause can actually
lead to a situation where rebalancing leaves a partition completely
un-owned.  It's important to make sure that rebalance.max.retries *
rebalance.backoff.ms is longer than any GC pause that your consumers
experience.

A more common, but generally less severe, pain point is that rebalancing
can move partitions around quite a bit because of how the simple,
alphabetical dealing out of partitions works.  This can result in lots of
cache misses in your consumers as they suddenly start seeing a totally
different set of keys; or, more generally, cause issues with other state in
your consumer tier.

-kevin


On Thu, Nov 5, 2015 at 10:15 AM, Prabhjot Bharaj <prabhbha...@gmail.com>
wrote:

> Hello Folks,
>
> I am evaluating some failure scenarios during consumer rebalance in the
> high-level consumer.
> The idea of this test is to know what are the pain points from
> operational/maintanence stand poitn that I need to consider when a consumer
> rebalance takes place.
>
> Also, if there are any known issues that you are aware of or if you have
> hit any issues whenever your high-level consumers in the same consumer
> group trips (in case of failure), or scale-out in order to share the load,
> request you to share your experiences. This will help me have proper
> procedures in case some problem happens during consumer rebalance.
>
> Thanks,
> Prabhjot
>

Reply via email to