Hi team,
We have a Kafka stream application running with Kafka clients 3.8.1. We met a strange issue and has no clue to find the root cause at this moment , please help. The issue was found because one of the partition lag is increasing, then we checked the stream state, found one node has state stuck in rebalancing. Then we checked logs. Only 2 logs found: 2025-11-12T01:38:52.315+0800|WARN|kafka-coordinator-heartbeat-thread|Stream-xxxx|o.a.k.c.c.i.ConsumerCoordinator.handlerPollTimeoutExpiry[AbstractCoordinator.java:1147]|[Consumer clientId=Stream-xxxx-StreamThread-11-consumer, groupId=Stream-xxxx] consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records. 2025-11-12T01:39:01.382+0800|ERROR|Stream-xxxx-StreamThread-2|o.a.k.s.p.internals.StreamTask.closeStateManager[StateManagerUtil.java:149]|stream-thread [Stream-xxxx-StreamThread-2] task [1_11] Failed to acquire lock while closing the state store for Active task 1_11 I'm not sure if above error logs are related to the issue, but (1) the log time is almost same as the time when we see the partition lag start increasing (2) the lag increasing partition is 11, same as the log mentioned task 1_11 I have also tried dig existing JIRA issues to see if this is an known issue, it looks a lot like (1) KAFKA-16025: but this one should already fixed in 3.8.1? (2) KAFKA-18355: but this bug said the new thread keep throwing the lock exception, I only have one line error log related to the lock. It seems like: the client met issue and try to change state from active to rebalancing, but it failed before reach the request leave consumer group part. As a result, no rebalancing happen, and no real consumer is processing the partition data... The issue happened on 3 different setups already, but unfortunately all of them are running production environments, not much debug information I can get for now :( Looking forward to your reply. Thanks - Chen
