Omid Aladini created KAFKA-1918: ----------------------------------- Summary: System test for ZooKeeper quorum failure scenarios Key: KAFKA-1918 URL: https://issues.apache.org/jira/browse/KAFKA-1918 Project: Kafka Issue Type: Test Reporter: Omid Aladini
Following up on the [conversation on the mailing list|http://mail-archives.apache.org/mod_mbox/kafka-users/201502.mbox/%3CCAHwHRrX3SAWDUGF5LjU4rrMUsqv%3DtJcyjX7OENeL5C_V5o3tCw%40mail.gmail.com%3E], the FAQ writes: {quote} Once the Zookeeper quorum is down, brokers could result in a bad state and could not normally serve client requests, etc. Although when Zookeeper quorum recovers, the Kafka brokers should be able to resume to normal state automatically, _there are still a few +corner cases+ the they cannot and a hard kill-and-recovery is required to bring it back to normal_. Hence it is recommended to closely monitor your zookeeper cluster and provision it so that it is performant. {quote} As ZK quorum failures are inevitable (due to rolling upgrades of ZK, leader hardware failure, etc), it would be great to identify the corner cases (if they still exist) and fix them if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)