[ 
https://issues.apache.org/jira/browse/KAFKA-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287713#comment-14287713
 ] 

Alexey Ozeritskiy commented on KAFKA-1804:
------------------------------------------

The last time we saw the bug during restart the network switch on a cluster of 
20 machines. kafka-network-threads fell down on more than half machines. As a 
result, the cluster became unavailable. We are trying to find the specific 
steps that reproduce the problem.


> Kafka network thread lacks top exception handler
> ------------------------------------------------
>
>                 Key: KAFKA-1804
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1804
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.2
>            Reporter: Oleg Golovin
>            Priority: Critical
>
> We have faced the problem that some kafka network threads may fail, so that 
> jstack attached to Kafka process showed fewer threads than we had defined in 
> our Kafka configuration. This leads to API requests processed by this thread 
> getting stuck unresponed.
> There were no error messages in the log regarding thread failure.
> We have examined Kafka code to find out there is no top try-catch block in 
> the network thread code, which could at least log possible errors.
> Could you add top-level try-catch block for the network thread, which should 
> recover network thread in case of exception?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to