Thanks, Victor. Also check out the fault injection umbrella JIRA here: https://issues.apache.org/jira/browse/KAFKA-5775 with more subtasks.
cheers, Colin On Fri, Sep 1, 2017, at 05:07, Viktor Somogyi wrote: > Hi Colin, > > I'd be interested in this and also think it's a valuable thing to have > this > for the community and would greatly increase the test coverage. > Saw you already have a PR, I'll give a review as I have time :). > > Viktor > > On Tue, Aug 22, 2017 at 9:36 PM, Timothy Chen <tnac...@gmail.com> wrote: > > > Hi Colin, > > > > The Kibosh code is just a README for now, is it going to be published soon? > > > > Tim > > > > On Tue, Aug 22, 2017 at 11:44 AM, Colin McCabe <cmcc...@apache.org> wrote: > > > Hi all, > > > > > > I've been working on a fault injector for Apache Kafka. The general > > > idea is to create faults such as network partitions or disk failures, > > > and see what happens in the cluster. The fault injector can run as part > > > of a ducktape system test, or standalone. > > > > > > The fault injector has two processes: a coordinator, and an agent. The > > > agent process is responsible for actually implementing the faults. For > > > example, it might run iptables, send signals to processes, generate a > > > lot of load, or do something else to disrupt the computer it is running > > > on. We run an agent process on each node where we would like to > > > potentially inject faults. So it will run alongside the brokers, > > > zookeeper nodes, etc. > > > > > > The coordinator process is responsible for communicating with the agent > > > processes and for scheduling faults. For example, the coordinator can > > > be instructed to create a fault immediately on several nodes. Or it can > > > be instructed to create faults over time, based on a pseudorandom seed. > > > Both the coordinator and the agent expose a REST interface that accepts > > > objects serialized via JSON. > > > > > > I think two kinds of faults will be especially interesting: network > > > faults, and disk errors. Simulating network faults in a Linux > > > environment is relatively straightforward using iptables. Disk errors > > > are tougher to simulate, but I have written a FUSE filesystem to do > > > this. The filesystem essentially simulates a bind mount in most cases, > > > but it can take a JSON specification telling it to inject certain > > > faults. (Disk errors seem especially relevant to the ongoing work on > > > JBOD.) > > > > > > Although it's not a user-visible component, I think having a fault > > > injector will be really great for Kafka users. It will really help us > > > stress test Kafka in more situations. I'm going to post some patches in > > > a day or two-- it would be great to get some feedback. Check out > > > https://cwiki.apache.org/confluence/display/KAFKA/Fault+Injection > > > > > > best, > > > Colin > >