[ https://issues.apache.org/jira/browse/KAFKA-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541127#comment-14541127 ]
Geoffrey Anderson commented on KAFKA-1918: ------------------------------------------ Just making sure you're aware of work we're doing at Confluent on system tests. I'll be posting a KIP for this soon, but here's some info: The original plan is sketched here: https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements This is the core library/test framework (WIP) which aids in writing and running the tests https://github.com/confluentinc/ducktape/ This has system tests we've written to date for the Confluent Platform https://github.com/confluentinc/muckrake > System test for ZooKeeper quorum failure scenarios > -------------------------------------------------- > > Key: KAFKA-1918 > URL: https://issues.apache.org/jira/browse/KAFKA-1918 > Project: Kafka > Issue Type: Test > Reporter: Omid Aladini > > Following up on the [conversation on the mailing > list|http://mail-archives.apache.org/mod_mbox/kafka-users/201502.mbox/%3CCAHwHRrX3SAWDUGF5LjU4rrMUsqv%3DtJcyjX7OENeL5C_V5o3tCw%40mail.gmail.com%3E], > the FAQ writes: > {quote} > Once the Zookeeper quorum is down, brokers could result in a bad state and > could not normally serve client requests, etc. Although when Zookeeper quorum > recovers, the Kafka brokers should be able to resume to normal state > automatically, _there are still a few +corner cases+ the they cannot and a > hard kill-and-recovery is required to bring it back to normal_. Hence it is > recommended to closely monitor your zookeeper cluster and provision it so > that it is performant. > {quote} > As ZK quorum failures are inevitable (due to rolling upgrades of ZK, leader > hardware failure, etc), it would be great to identify the corner cases (if > they still exist) and fix them if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)