Hi Colin, I'd be interested in this and also think it's a valuable thing to have this for the community and would greatly increase the test coverage. Saw you already have a PR, I'll give a review as I have time :).
Viktor On Tue, Aug 22, 2017 at 9:36 PM, Timothy Chen <tnac...@gmail.com> wrote: > Hi Colin, > > The Kibosh code is just a README for now, is it going to be published soon? > > Tim > > On Tue, Aug 22, 2017 at 11:44 AM, Colin McCabe <cmcc...@apache.org> wrote: > > Hi all, > > > > I've been working on a fault injector for Apache Kafka. The general > > idea is to create faults such as network partitions or disk failures, > > and see what happens in the cluster. The fault injector can run as part > > of a ducktape system test, or standalone. > > > > The fault injector has two processes: a coordinator, and an agent. The > > agent process is responsible for actually implementing the faults. For > > example, it might run iptables, send signals to processes, generate a > > lot of load, or do something else to disrupt the computer it is running > > on. We run an agent process on each node where we would like to > > potentially inject faults. So it will run alongside the brokers, > > zookeeper nodes, etc. > > > > The coordinator process is responsible for communicating with the agent > > processes and for scheduling faults. For example, the coordinator can > > be instructed to create a fault immediately on several nodes. Or it can > > be instructed to create faults over time, based on a pseudorandom seed. > > Both the coordinator and the agent expose a REST interface that accepts > > objects serialized via JSON. > > > > I think two kinds of faults will be especially interesting: network > > faults, and disk errors. Simulating network faults in a Linux > > environment is relatively straightforward using iptables. Disk errors > > are tougher to simulate, but I have written a FUSE filesystem to do > > this. The filesystem essentially simulates a bind mount in most cases, > > but it can take a JSON specification telling it to inject certain > > faults. (Disk errors seem especially relevant to the ongoing work on > > JBOD.) > > > > Although it's not a user-visible component, I think having a fault > > injector will be really great for Kafka users. It will really help us > > stress test Kafka in more situations. I'm going to post some patches in > > a day or two-- it would be great to get some feedback. Check out > > https://cwiki.apache.org/confluence/display/KAFKA/Fault+Injection > > > > best, > > Colin >