[ https://issues.apache.org/jira/browse/KAFKA-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279396#comment-14279396 ]
Alexey Ozeritskiy edited comment on KAFKA-1804 at 1/16/15 7:44 AM: ------------------------------------------------------------------- We've written the simple patch for kafka-network-thread: {code:java} override def run(): Unit = { try { iteration() // = the original run() } catch { case e: Throwable => error("ERROR IN NETWORK THREAD: %s".format(e), e) Runtime.getRuntime.halt(1) } } {code} and got the following trace: {code} [2015-01-15 23:04:08,537] ERROR ERROR IN NETWORK THREAD: java.util.NoSuchElementException: None.get (kafka.network.Processor) java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:313) at scala.None$.get(Option.scala:311) at kafka.network.ConnectionQuotas.dec(SocketServer.scala:544) at kafka.network.AbstractServerThread.close(SocketServer.scala:165) at kafka.network.AbstractServerThread.close(SocketServer.scala:157) at kafka.network.Processor.close(SocketServer.scala:394) at kafka.network.Processor.processNewResponses(SocketServer.scala:426) at kafka.network.Processor.iteration(SocketServer.scala:328) at kafka.network.Processor.run(SocketServer.scala:381) at java.lang.Thread.run(Thread.java:745) {code} was (Author: aozeritsky): We've written the simple patch for kafka-network-thread: {code:java} override def run(): Unit = { try { original_run() } catch { case e: Throwable => error("ERROR IN NETWORK THREAD: %s".format(e), e) Runtime.getRuntime.halt(1) } } {code} and got the following trace: {code} [2015-01-15 23:04:08,537] ERROR ERROR IN NETWORK THREAD: java.util.NoSuchElementException: None.get (kafka.network.Processor) java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:313) at scala.None$.get(Option.scala:311) at kafka.network.ConnectionQuotas.dec(SocketServer.scala:544) at kafka.network.AbstractServerThread.close(SocketServer.scala:165) at kafka.network.AbstractServerThread.close(SocketServer.scala:157) at kafka.network.Processor.close(SocketServer.scala:394) at kafka.network.Processor.processNewResponses(SocketServer.scala:426) at kafka.network.Processor.iteration(SocketServer.scala:328) at kafka.network.Processor.run(SocketServer.scala:381) at java.lang.Thread.run(Thread.java:745) {code} > Kafka network thread lacks top exception handler > ------------------------------------------------ > > Key: KAFKA-1804 > URL: https://issues.apache.org/jira/browse/KAFKA-1804 > Project: Kafka > Issue Type: Bug > Reporter: Oleg Golovin > > We have faced the problem that some kafka network threads may fail, so that > jstack attached to Kafka process showed fewer threads than we had defined in > our Kafka configuration. This leads to API requests processed by this thread > getting stuck unresponed. > There were no error messages in the log regarding thread failure. > We have examined Kafka code to find out there is no top try-catch block in > the network thread code, which could at least log possible errors. > Could you add top-level try-catch block for the network thread, which should > recover network thread in case of exception? -- This message was sent by Atlassian JIRA (v6.3.4#6332)