Hi Guys, I'm having an issue with a kafka stream app, at some point I get a consumer leave group message. Exactly same issue described to another person here:
https://stackoverflow.com/questions/61245480/how-to-detect-a-kafka-streams-app-in-zombie-state But the issue is that stream state is continuing reporting that the stream is running, but it's not consuming anything, but the stream never rejoin the consumer group, so my application with only one replica stop consuming. I have a health check on Kubernetes where I expose the stream state to then restart the pod. But as the kafka stream state it's always running when the consumer leaves the group, the app is still healthy in zombie state, so I need to manually go and restart the pod. Is this a bug? Or is there a way to check what is the stream consumer state to then expose as healt check for my application? This issue really happen randomly, usually all the Mondays. I'm using Kafka 2.8.1 and my app is made in kotlin. This is the message I get before zombie state, then there are no exceptions, errors or nothing more until I restart the pod manually. Sending LeaveGroup request to coordinator b-3.c4.kafka.us-east-1.amazonaws.com:9098 (id: 2147483644 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records. Thanks for the help.