[jira] [Comment Edited] (KAFKA-5786) Yet another exception is causing that streamming app is zombie

Matthias J. Sax (JIRA) Wed, 30 Aug 2017 12:03:31 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147838#comment-16147838
 ]


Matthias J. Sax edited comment on KAFKA-5786 at 8/30/17 7:02 PM:
-----------------------------------------------------------------

Thanks for the logs: If I read them correctly, some of your threads misses a 
rebalance due to long state recreation in a previous rebalance. Thus, they drop 
out of the consumer group without noticing in the first place. Thus, when the 
next rebalance happens, they try to commit but fail, as they are not part of 
the group any longer. This issues should be fixed by KAFKA-5152 -- 
nevertheless, KAFKA-5152 only covers {{CommitFailedException}} as in your case 
and a proper fix would be to not let the thread die in the first place on any 
exception. We do have a JIRA for this already: KAFKA-5541

I am going to close this as a duplicate. In 0.11.0.1, the probability that you 
hit this issues should be reduced (via KAFKA-5152), and I hope to get 
KAFKA-5541 into 1.0 that should deliver the proper fix.

Thanks for reporting the issue! Btw: you can also follow KAFKA-5156 for further 
improvements on internal exception handling.


was (Author: mjsax):
Thanks for the logs: If I read them correctly, some of your threads misses a 
rebalance due to long state recreation in a previous rebalance. Thus, they drop 
out of the consumer group without noticing in the first place. Thus, when the 
next rebalance happens, they try to commit but fail, as they are not part of 
the group any longer. This issues should be mitigated by KAFKA-5152 -- 
nevertheless, a proper fix would be to not let the thread die in the first 
place. We do have a JIRA for this already: KAFKA-5541

I am going to close this as a duplicate. In 0.11.0.1, the probability that you 
hit this issues should be reduced (via KAFKA-5152), and I hope to get 
KAFKA-5541 into 1.0 that should deliver the proper fix.

Thanks for reporting the issue! Btw: you can also follow KAFKA-5156 for further 
improvements on internal exception handling.

> Yet another exception is causing that streamming app is zombie
> --------------------------------------------------------------
>
>                 Key: KAFKA-5786
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5786
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Seweryn Habdank-Wojewodzki
>         Attachments: fatal-errors-by-rebalancing.zip
>
>
> Not handled exception in streamming app causes zombie state of the process.
> {code}
> 2017-08-24 15:17:40 WARN  StreamThread:978 - stream-thread 
> [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3] 
> Unexpected state transition from RUNNING to DEAD.
> 2017-08-24 15:17:40 FATAL StreamProcessor:67 - Caught unhandled exception: 
> stream-thread 
> [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3] Failed 
> to rebalance.; 
> [org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:589),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)]
>  in thread kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3
> {code}
> The final state of the app is similar to KAFKA-5779, but the exception and 
> its location is in different place.
> The exception shall be handled in the way that either application tries to 
> continue working or shall completely quit if the error is not recoverable.
> Current situation when application is zombie is not good.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (KAFKA-5786) Yet another exception is causing that streamming app is zombie

Reply via email to