[ https://issues.apache.org/jira/browse/KAFKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161942#comment-14161942 ]
Sriharsha Chintalapani commented on KAFKA-1663: ----------------------------------------------- [~nehanarkhede] Both TopicDeletionManager.resumeTopicDeletionThread() and awaitTopicDeletionNoification uses deleteLock and DeleteTopicThread.doWork() waits on awaitTopicDeletionNotification before it tries to acquire controllerLock. so simple fix would be to check if there are any topics in topicsToBeDeleted set and call resumeTopicDeletionThread() from start(). I agree that it is best to consolidate on a single lock. > Controller unable to shutdown after a soft failure > -------------------------------------------------- > > Key: KAFKA-1663 > URL: https://issues.apache.org/jira/browse/KAFKA-1663 > Project: Kafka > Issue Type: Bug > Reporter: Sriharsha Chintalapani > Assignee: Sriharsha Chintalapani > Priority: Blocker > Fix For: 0.8.2 > > Attachments: KAFKA-1663.patch > > > As part of testing KAFKA-1558 I came across a case where inducing soft > failure in the current controller elects a new controller but the old > controller doesn't shutdown properly. > steps to reproduce > 1) 5 broker cluster > 2) high number of topics(I tested it with 1000 topics) > 3) on the current controller do kill -SIGSTOP pid( broker's process id) > 4) wait for bit over zookeeper timeout (server.properties) > 5) kill -SIGCONT pid > 6) There will be a new controller elected. check old controller's > log > [2014-09-30 15:59:53,398] INFO [SessionExpirationListener on 1], ZK expired; > shut down all controller components and try to re-elect > (kafka.controller.KafkaController$SessionExpirationListener) > [2014-09-30 15:59:53,400] INFO [delete-topics-thread-1], Shutting down > (kafka.controller.TopicDeletionManager$DeleteTopicsThread) > If it stops there and the broker logs keeps printing > Cached zkVersion [0] not equal to that in zookeeper, skip updating ISR > (kafka.cluster.Partition) > than the controller shutdown never completes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)