[jira] [Commented] (KAFKA-1663) Controller unable to shutdown after a soft failure

Sriharsha Chintalapani (JIRA) Thu, 02 Oct 2014 17:36:07 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157511#comment-14157511
 ]


Sriharsha Chintalapani commented on KAFKA-1663:
-----------------------------------------------

[~nehanarkhede] I've a question on TopicDeletionManager code
I think the idea is to start the DeleteTopicsThread and wait for the events 
like topics are added to delete set and trigger the doWork() method. 
right now doWork() method calls awaitTopicDeletionNotifictation

  private def awaitTopicDeletionNotification() {
    inLock(deleteLock) {
      while(!deleteTopicsThread.isRunning.get() && 
!deleteTopicStateChanged.compareAndSet(true, false)) {
                       deleteTopicsCond.await()
}
This condition seems to be wrong and it doesn't block DeleteTopicThread once 
the TopicDeletionManager starts the deleteTopicsThread.doWork() continues to 
execute.

The above condition should state
 while(deleteTopicsThread.isRunning.get() && 
deleteTopicStateChanged.compareAndSet(true, false)) {
        deleteTopicsCond.await()
}
whenever there is topic is added we are callign resumeTopicDeletionThread() 
which sets deleteTopicStateChanged to true and sends deleteTopicCond.signal() 
which should wake up doWork() and continue with deletion of the topic.
I am testing with this change, will update with the results. 


> Controller unable to shutdown after a soft failure
> --------------------------------------------------
>
>                 Key: KAFKA-1663
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1663
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Sriharsha Chintalapani
>            Priority: Blocker
>             Fix For: 0.8.2
>
>
> As part of testing KAFKA-1558 I came across a case where inducing soft 
> failure in the current controller elects a new controller  but the old 
> controller doesn't shutdown properly.
> steps to reproduce
> 1) 5 broker cluster
> 2) high number of topics(I tested it with 1000 topics)
> 3) on the current controller do kill -SIGSTOP  pid( broker's process id)
> 4) wait for bit over zookeeper timeout (server.properties)
> 5) kill -SIGCONT pid
> 6) There will be a new controller elected. check old controller's
> log 
> [2014-09-30 15:59:53,398] INFO [SessionExpirationListener on 1], ZK expired; 
> shut down all controller components and try to re-elect 
> (kafka.controller.KafkaController$SessionExpirationListener)
> [2014-09-30 15:59:53,400] INFO [delete-topics-thread-1], Shutting down 
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread)
> If it stops there and the broker  logs keeps printing 
> Cached zkVersion [0] not equal to that in zookeeper, skip updating ISR 
> (kafka.cluster.Partition)
> than the controller shutdown never completes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1663) Controller unable to shutdown after a soft failure

Reply via email to