Rohit Bobade created KAFKA-15520: ------------------------------------ Summary: Kafka Streams Stateful Aggregation Rebalancing causing processing to pause on all partitions Key: KAFKA-15520 URL: https://issues.apache.org/jira/browse/KAFKA-15520 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.6.2 Reporter: Rohit Bobade
Kafka broker version: 2.8.0 Kafka Streams client version: 2.6.2 I am running kafka streams stateful aggregations on K8s statefulset with persistent volume attached to each pod. I have also specified props.put(ConsumerConfig.GROUP_INSTANCE_ID_CONFIG, podName); which makes sure it gets the sticky partition assignment. Enabled standby replica - props.put(StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, 1); and set props.put(StreamsConfig.ACCEPTABLE_RECOVERY_LAG_CONFIG, "0"); However, I'm seeing that when pods restart - it triggers rebalances and causes processing to be paused on all pods till the rebalance and state restore is in progress. My understanding is that even if there is a rebalance - only the partitions that should be moved around will be restored in a cooperative way and not pause all the processing. Also, it should failover to standby replica in this case and avoid state restoring on other pods. I have increased session timeout to 480 seconds and max poll interval to 15 mins to minimize rebalances. Also added props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, CooperativeStickyAssignor.class.getName()); to enable CooperativeStickyAssignor could someone please help if I'm missing something? -- This message was sent by Atlassian Jira (v8.20.10#820010)