[
https://issues.apache.org/jira/browse/KAFKA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raman Gupta resolved KAFKA-10229.
---------------------------------
Resolution: Invalid
Not an issue with Kafka -- the code run by the stream was blocked.
> Kafka stream dies for no apparent reason, no errors logged on client or server
> ------------------------------------------------------------------------------
>
> Key: KAFKA-10229
> URL: https://issues.apache.org/jira/browse/KAFKA-10229
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 2.4.1
> Reporter: Raman Gupta
> Priority: Major
>
> My broker and clients are 2.4.1. I'm currently running a single broker. I
> have a Kafka stream with exactly once processing turned on. I also have an
> uncaught exception handler defined on the client. I have a stream which I
> noticed was lagging. Upon investigation, I see that the consumer group was
> empty.
> On restarting the consumers, the consumer group re-established itself, but
> after about 8 minutes, the group became empty again. There is nothing logged
> on the client side about any stream errors, despite the existence of an
> uncaught exception handler.
> In the broker logs, I see that about 8 minutes after the clients restart /
> the stream goes to RUNNING state:
> ```
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Member
> cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 in group
> produs-cisFileIndexer-stream has failed, removing it from the group
> (kafka.coordinator.group.GroupCoordinator)
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Preparing to rebalance
> group produs-cisFileIndexer-stream in state PreparingRebalance with old
> generation 228 (__consumer_offsets-3) (reason: removing member
> cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 on heartbeat
> expiration) (kafka.coordinator.group.GroupCoordinator)
> ```
> so according to this the consumer heartbeat has expired. I don't know why
> this would be, logging shows that the stream was running and processing
> messages normally and then just stopped processing anything about 4 minutes
> before it dies, with no apparent errors or issues or anything logged via the
> uncaught exception handler.
> It doesn't appear to be related to any specific poison pill type messages:
> restarting the stream causes it to reprocess a bunch more messages from the
> backlog, and then die again approximately 8 minutes later. At the time of the
> last message consumed by the stream, there are no `INFO`-level or above logs
> either in the client or the broker, or any errors whatsoever. The stream
> consumption simply stops.
> There are two consumers -- even if I limit consumption to only a single
> consumer, the same thing happens.
> The runtime environment is Kubernetes.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)