Fedor Korotkiy created KAFKA-1310: ------------------------------------- Summary: Zookeeper timeout causes deadlock in Controller Key: KAFKA-1310 URL: https://issues.apache.org/jira/browse/KAFKA-1310 Project: Kafka Issue Type: Bug Reporter: Fedor Korotkiy
Steps to reproduce: 1. Checkout and build 0.8.1 branch from github: git clone g...@github.com:apache/kafka.git && cd kafka && git checkout origin/0.8.1 && ./gradlew jar 2. Start zookeeper server: ./bin/zookeeper-server-start.sh config/zookeeper.properties 3. Start kafka server: ./bin/kafka-server-start.sh config/server.properties 4. Suspend zookeeper process for 10 seconds (ctrl-Z, then %1). 5. And kafka hasn't been re-registered in zookeeper. ./bin/zookeeper-shell.sh ls /brokers/ids >> [] Root cause of the problem seems to be the deadlock between DeleteTopicsThread and SessionExpirationListener in KafkaController. 1. DeleteTopicsThread acquires controllerLock and await()-s on deleteTopicsCond in awaitTopicDeletionNotification() 2. SessionExpirationListener fires. It acquires controllerLock and tries to shutdown deleteTopicManager(in onControllerResignation()). This interrupts DeleteTopicsThread. 3. DeleteTopicsThread can't return from deleteTopicsCond.await() because controllerLock is taken. We got a deadlock. -- This message was sent by Atlassian JIRA (v6.2#6252)