[
https://issues.apache.org/jira/browse/KAFKA-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147838#comment-16147838
]
Matthias J. Sax edited comment on KAFKA-5786 at 8/30/17 7:02 PM:
-----------------------------------------------------------------
Thanks for the logs: If I read them correctly, some of your threads misses a
rebalance due to long state recreation in a previous rebalance. Thus, they drop
out of the consumer group without noticing in the first place. Thus, when the
next rebalance happens, they try to commit but fail, as they are not part of
the group any longer. This issues should be fixed by KAFKA-5152 --
nevertheless, KAFKA-5152 only covers {{CommitFailedException}} as in your case
and a proper fix would be to not let the thread die in the first place on any
exception. We do have a JIRA for this already: KAFKA-5541
I am going to close this as a duplicate. In 0.11.0.1, the probability that you
hit this issues should be reduced (via KAFKA-5152), and I hope to get
KAFKA-5541 into 1.0 that should deliver the proper fix.
Thanks for reporting the issue! Btw: you can also follow KAFKA-5156 for further
improvements on internal exception handling.
was (Author: mjsax):
Thanks for the logs: If I read them correctly, some of your threads misses a
rebalance due to long state recreation in a previous rebalance. Thus, they drop
out of the consumer group without noticing in the first place. Thus, when the
next rebalance happens, they try to commit but fail, as they are not part of
the group any longer. This issues should be mitigated by KAFKA-5152 --
nevertheless, a proper fix would be to not let the thread die in the first
place. We do have a JIRA for this already: KAFKA-5541
I am going to close this as a duplicate. In 0.11.0.1, the probability that you
hit this issues should be reduced (via KAFKA-5152), and I hope to get
KAFKA-5541 into 1.0 that should deliver the proper fix.
Thanks for reporting the issue! Btw: you can also follow KAFKA-5156 for further
improvements on internal exception handling.
> Yet another exception is causing that streamming app is zombie
> --------------------------------------------------------------
>
> Key: KAFKA-5786
> URL: https://issues.apache.org/jira/browse/KAFKA-5786
> Project: Kafka
> Issue Type: Bug
> Reporter: Seweryn Habdank-Wojewodzki
> Attachments: fatal-errors-by-rebalancing.zip
>
>
> Not handled exception in streamming app causes zombie state of the process.
> {code}
> 2017-08-24 15:17:40 WARN StreamThread:978 - stream-thread
> [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3]
> Unexpected state transition from RUNNING to DEAD.
> 2017-08-24 15:17:40 FATAL StreamProcessor:67 - Caught unhandled exception:
> stream-thread
> [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3] Failed
> to rebalance.;
> [org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:589),
>
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553),
>
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)]
> in thread kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3
> {code}
> The final state of the app is similar to KAFKA-5779, but the exception and
> its location is in different place.
> The exception shall be handled in the way that either application tries to
> continue working or shall completely quit if the error is not recoverable.
> Current situation when application is zombie is not good.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)