[ https://issues.apache.org/jira/browse/KAFKA-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neha Narkhede resolved KAFKA-702. --------------------------------- Resolution: Fixed Checked this in to proceed with deployment > Deadlock between request handler/processor threads > -------------------------------------------------- > > Key: KAFKA-702 > URL: https://issues.apache.org/jira/browse/KAFKA-702 > Project: Kafka > Issue Type: Bug > Components: network > Affects Versions: 0.8 > Reporter: Joel Koshy > Assignee: Jay Kreps > Priority: Blocker > Labels: bugs > Fix For: 0.8 > > Attachments: KAFKA-702-v1.patch > > > We have seen this a couple of times in the past few days in a test cluster. > The request handler and processor threads deadlock on the request/response > queues bringing the server to a halt > "kafka-processor-10251-7" prio=10 tid=0x00007f4a0c3c9800 nid=0x4c39 waiting > on condition [0x00007f46f698e000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007f48c9dd2698> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) > at > java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:252) > at kafka.network.RequestChannel.sendRequest(RequestChannel.scala:107) > at kafka.network.Processor.read(SocketServer.scala:321) > at kafka.network.Processor.run(SocketServer.scala:231) > at java.lang.Thread.run(Thread.java:619) > "kafka-request-handler-7" daemon prio=10 tid=0x00007f4a0c57f000 nid=0x4c47 > waiting on condition [0x00007f46f5b80000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007f48c9dd6348> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) > at > java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:252) > at kafka.network.RequestChannel.sendResponse(RequestChannel.scala:112) > at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:198) > at kafka.server.KafkaApis.handle(KafkaApis.scala:58) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:41) > at java.lang.Thread.run(Thread.java:619) > This is because there is a cycle in the wait-for graph of processor threads > and request handler threads. If the request handling slows down on a busy > server, the request queue fills up. All processor threads quickly block on > adding incoming requests to the request queue. Due to this, those threads do > not processes responses filling up their response queues. At this moment, the > request handler threads start blocking on adding responses to the respective > response queues. This can lead to a deadlock where every thread is holding a > lock on one queue and asking a lock for the other queue. This brings the > server to a halt where it accepts connections but every request gets timed > out. > One way to resolve this is by breaking the cycle in the wait-for graph of the > request handler and processor threads. Instead of having the processor > threads dispatching the responses, we can have one or more dedicated response > handler threads that dequeue responses from the queue and write those on the > socket. One downside of this approach is that now access to the selector will > have to be synchronized. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira