[ https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380776#comment-14380776 ]
Onur Karaman commented on KAFKA-2046: ------------------------------------- It looks like I hit a deadlock yesterday between the DeleteTopicsThread and RequestSendThread on the controller when controller.message.queue.size was small. When the blocking queue shared between DeleteTopicsThread and RequestSendThread is full, the DeleteTopicsThread puts will be blocked while holding the controller lock until the RequestSendThread takes items from the queue, but the RequestSendThread runs a callback after sending a request which needs the controller lock in order to finish processing a request, causing the hang. Delete topic performs a state transition to ReplicaDeletionStarted, and this state transition involves a callback (deleteTopicStopReplicaCallback) that waits on the controller lock. This explains why I had only seen one replica from grep "handling stop replica (delete=true)" kafka-state-change.log, as it hanged on the callback of that replica's transition to ReplicaDeletionStarted. Bumping up the controller.message.queue.size doesn't get rid of the deadlock but should make it less common. I think this should only be considered a temporary fix, as tweaking a config value shouldn't decide whether or not we hit a deadlock. Here are snippets of the thread dump: {code} "Controller-xyz-to-broker-abc-send-thread" ... java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) ... at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) ... at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at kafka.utils.Utils$.inLock(Utils.scala:564) at kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$deleteTopicStopReplicaCallback(TopicDeletionManager.scala:371) at kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2$$anonfun$apply$3.apply(TopicDeletionManager.scala:338) at kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2$$anonfun$apply$3.apply(TopicDeletionManager.scala:338) at kafka.controller.ControllerBrokerRequestBatch$$anonfun$addStopReplicaRequestForBrokers$2$$anonfun$apply$mcVI$sp$2.apply(ControllerChannelManager.scala:231) at kafka.controller.ControllerBrokerRequestBatch$$anonfun$addStopReplicaRequestForBrokers$2$$anonfun$apply$mcVI$sp$2.apply(ControllerChannelManager.scala:231) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:161) "delete-topics-thread-xyz" ... java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) ... at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350) at kafka.controller.ControllerChannelManager.sendRequest(ControllerChannelManager.scala:57) ... at kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:310) at kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:115) ... at kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:327) at kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:360) ... at kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:305) ... at kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:390) at kafka.utils.Utils$.inLock(Utils.scala:566) at kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:390) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {code} > Delete topic still doesn't work > ------------------------------- > > Key: KAFKA-2046 > URL: https://issues.apache.org/jira/browse/KAFKA-2046 > Project: Kafka > Issue Type: Bug > Reporter: Clark Haskins > Assignee: Onur Karaman > > I just attempted to delete at 128 partition topic with all inbound producers > stopped. > The result was as follows: > The /admin/delete_topics znode was empty > the topic under /brokers/topics was removed > The Kafka topics command showed that the topic was removed > However, the data on disk on each broker was not deleted and the topic has > not yet been re-created by starting up the inbound mirror maker. > Let me know what additional information is needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)