[
https://issues.apache.org/jira/browse/KAFKA-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287713#comment-14287713
]
Alexey Ozeritskiy commented on KAFKA-1804:
------------------------------------------
The last time we saw the bug during restart the network switch on a cluster of
20 machines. kafka-network-threads fell down on more than half machines. As a
result, the cluster became unavailable. We are trying to find the specific
steps that reproduce the problem.
> Kafka network thread lacks top exception handler
> ------------------------------------------------
>
> Key: KAFKA-1804
> URL: https://issues.apache.org/jira/browse/KAFKA-1804
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.8.2
> Reporter: Oleg Golovin
> Priority: Critical
>
> We have faced the problem that some kafka network threads may fail, so that
> jstack attached to Kafka process showed fewer threads than we had defined in
> our Kafka configuration. This leads to API requests processed by this thread
> getting stuck unresponed.
> There were no error messages in the log regarding thread failure.
> We have examined Kafka code to find out there is no top try-catch block in
> the network thread code, which could at least log possible errors.
> Could you add top-level try-catch block for the network thread, which should
> recover network thread in case of exception?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)