[
https://issues.apache.org/jira/browse/KAFKA-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541127#comment-14541127
]
Geoffrey Anderson commented on KAFKA-1918:
------------------------------------------
Just making sure you're aware of work we're doing at Confluent on system tests.
I'll be posting a KIP for this soon, but here's some info:
The original plan is sketched here:
https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
This is the core library/test framework (WIP) which aids in writing and running
the tests
https://github.com/confluentinc/ducktape/
This has system tests we've written to date for the Confluent Platform
https://github.com/confluentinc/muckrake
> System test for ZooKeeper quorum failure scenarios
> --------------------------------------------------
>
> Key: KAFKA-1918
> URL: https://issues.apache.org/jira/browse/KAFKA-1918
> Project: Kafka
> Issue Type: Test
> Reporter: Omid Aladini
>
> Following up on the [conversation on the mailing
> list|http://mail-archives.apache.org/mod_mbox/kafka-users/201502.mbox/%3CCAHwHRrX3SAWDUGF5LjU4rrMUsqv%3DtJcyjX7OENeL5C_V5o3tCw%40mail.gmail.com%3E],
> the FAQ writes:
> {quote}
> Once the Zookeeper quorum is down, brokers could result in a bad state and
> could not normally serve client requests, etc. Although when Zookeeper quorum
> recovers, the Kafka brokers should be able to resume to normal state
> automatically, _there are still a few +corner cases+ the they cannot and a
> hard kill-and-recovery is required to bring it back to normal_. Hence it is
> recommended to closely monitor your zookeeper cluster and provision it so
> that it is performant.
> {quote}
> As ZK quorum failures are inevitable (due to rolling upgrades of ZK, leader
> hardware failure, etc), it would be great to identify the corner cases (if
> they still exist) and fix them if necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)