[ 
https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380776#comment-14380776
 ] 

Onur Karaman commented on KAFKA-2046:
-------------------------------------

It looks like I hit a deadlock yesterday between the DeleteTopicsThread and 
RequestSendThread on the controller when controller.message.queue.size was 
small.

When the blocking queue shared between DeleteTopicsThread and RequestSendThread 
is full, the DeleteTopicsThread puts will be blocked while holding the 
controller lock until the RequestSendThread takes items from the queue, but the 
RequestSendThread runs a callback after sending a request which needs the 
controller lock in order to finish processing a request, causing the hang. 
Delete topic performs a state transition to ReplicaDeletionStarted, and this 
state transition involves a callback (deleteTopicStopReplicaCallback) that 
waits on the controller lock.

This explains why I had only seen one replica from grep "handling stop replica 
(delete=true)" kafka-state-change.log, as it hanged on the callback of that 
replica's transition to ReplicaDeletionStarted.

Bumping up the controller.message.queue.size doesn't get rid of the deadlock 
but should make it less common. I think this should only be considered a 
temporary fix, as tweaking a config value shouldn't decide whether or not we 
hit a deadlock.

Here are snippets of the thread dump:
{code}
"Controller-xyz-to-broker-abc-send-thread" ...
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        ...
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        ...
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at kafka.utils.Utils$.inLock(Utils.scala:564)
        at 
kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$deleteTopicStopReplicaCallback(TopicDeletionManager.scala:371)
        at 
kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2$$anonfun$apply$3.apply(TopicDeletionManager.scala:338)
        at 
kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2$$anonfun$apply$3.apply(TopicDeletionManager.scala:338)
        at 
kafka.controller.ControllerBrokerRequestBatch$$anonfun$addStopReplicaRequestForBrokers$2$$anonfun$apply$mcVI$sp$2.apply(ControllerChannelManager.scala:231)
        at 
kafka.controller.ControllerBrokerRequestBatch$$anonfun$addStopReplicaRequestForBrokers$2$$anonfun$apply$mcVI$sp$2.apply(ControllerChannelManager.scala:231)
        at 
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:161)

"delete-topics-thread-xyz" ...
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        ...
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
        at 
kafka.controller.ControllerChannelManager.sendRequest(ControllerChannelManager.scala:57)
        ...
        at 
kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:310)
        at 
kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:115)
        ...
        at 
kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:327)
        at 
kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:360)
        ...
        at 
kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:305)
        ...
        at 
kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:390)
        at kafka.utils.Utils$.inLock(Utils.scala:566)
        at 
kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:390)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
{code}

> Delete topic still doesn't work
> -------------------------------
>
>                 Key: KAFKA-2046
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2046
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Clark Haskins
>            Assignee: Onur Karaman
>
> I just attempted to delete at 128 partition topic with all inbound producers 
> stopped.
> The result was as follows:
> The /admin/delete_topics znode was empty
> the topic under /brokers/topics was removed
> The Kafka topics command showed that the topic was removed
> However, the data on disk on each broker was not deleted and the topic has 
> not yet been re-created by starting up the inbound mirror maker.
> Let me know what additional information is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to