Hi,

We have multiple replicas of an application running on a kubernetes cluster. 
Each application instance runs a stateful kafka stream application with an 
in-memory state-store (backed by a changelog topic). All instances of the 
stream apps are members of the same consumer group.


Deployments happen using the “rolling restart” method i.e. new replica(s) come 
up successfully, and existing (old) replica(s) are killed. Due to members 
joining the consumer group (new app instances) and members leaving the consumer 
group (old app instances), there is rebalancing of topic partitions within the 
group.


Ultimately, when all instances of the app have completed rolling restart, we 
see partitions have undergone rebalancing an excessive number of times. For 
example, the app has 48 instances and it is observed that each partition (say, 
partition #50) has undergone rebalancing a lot of times (50 - 57 times) by 
moving across several app instances. Total count of partition movements during 
the entire rolling restart is greater than 3000.


This excessive rebalancing incurs an overall lag on message processing SLAs, 
and is creating reliability issues.


So, we are wondering:


(1) is this expected, especially since cooperative rebalancing should ensure 
that not a lot of partitions get rebalanced


(2) why would any partition undergo so many rebalances across several app 
instances?


(3) is there some configuration (broker config or client config) that we can 
apply to reduce the total rebalances and partition movements during rolling 
restarts? We cannot consider static membership due to other technical 
constraints.


The runtime and network is extremely stable — no heartbeat misses, session 
timeouts etc.


DETAILS

-----------

  *   Kafka Broker Version = 2.6

  *   Kafka Streams Client Version = 2.7.0

  *   No. of app instances = 48

  *   No. of stream threads per stream app = 3

  *   Total partition count = 60

  *   Warmup Replicas (max.warmup.replicas) = 5

  *   Standby Replicas (num.standby.replicas) = 2

  *   probing.rebalance.interval.ms) = 300000 (5 minutes)

  *   session.timeout.ms = 10000 (10 seconds)

  *   heartbeat.interval.ms = 3000 (3 seconds)

  *   internal.leave.group.on.close = true

  *   linger.ms = 5

  *   processing.guarantee = at_least_once


Any help or information would be greatly appreciated.

Thanks,
Nagendra U M

Reply via email to