[ 
https://issues.apache.org/jira/browse/KAFKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161942#comment-14161942
 ] 

Sriharsha Chintalapani commented on KAFKA-1663:
-----------------------------------------------

[~nehanarkhede] Both TopicDeletionManager.resumeTopicDeletionThread() and 
awaitTopicDeletionNoification uses deleteLock and DeleteTopicThread.doWork() 
waits on awaitTopicDeletionNotification before it tries to acquire 
controllerLock.
so simple fix would be to check if there are any topics in topicsToBeDeleted 
set and call resumeTopicDeletionThread() from 
start(). 
I agree that it is best to consolidate on a single lock.

> Controller unable to shutdown after a soft failure
> --------------------------------------------------
>
>                 Key: KAFKA-1663
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1663
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Sriharsha Chintalapani
>            Assignee: Sriharsha Chintalapani
>            Priority: Blocker
>             Fix For: 0.8.2
>
>         Attachments: KAFKA-1663.patch
>
>
> As part of testing KAFKA-1558 I came across a case where inducing soft 
> failure in the current controller elects a new controller  but the old 
> controller doesn't shutdown properly.
> steps to reproduce
> 1) 5 broker cluster
> 2) high number of topics(I tested it with 1000 topics)
> 3) on the current controller do kill -SIGSTOP  pid( broker's process id)
> 4) wait for bit over zookeeper timeout (server.properties)
> 5) kill -SIGCONT pid
> 6) There will be a new controller elected. check old controller's
> log 
> [2014-09-30 15:59:53,398] INFO [SessionExpirationListener on 1], ZK expired; 
> shut down all controller components and try to re-elect 
> (kafka.controller.KafkaController$SessionExpirationListener)
> [2014-09-30 15:59:53,400] INFO [delete-topics-thread-1], Shutting down 
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread)
> If it stops there and the broker  logs keeps printing 
> Cached zkVersion [0] not equal to that in zookeeper, skip updating ISR 
> (kafka.cluster.Partition)
> than the controller shutdown never completes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to